Parallel processing system

ABSTRACT

Software development tools and techniques for configuring parallel processing systems to execute software modules implementing processes for solving complex problems, including over-the-counter trading processes and foreign exchange trading processes, to execute quickly and efficiently. The parallel processing system may include low-cost, consumer-grade multicore processing units. A process for solving a complex problem may be divided into software modules, including by evaluating the process to determine discrete processing steps that produce an intermediate result on which later steps of the process depend. The software modules created for a process may form a template processing chain describing multiple processing chains of the process that are to be executed. A software development tool for producing configuration information for multicore processing units may evaluate the software modules and the processing chains to determine whether the modules will execute quickly and efficiently on the multicore processing units of the parallel processing system.

CROSS REFERENCE TO RELATED APPLICATIONS

The present application is a continuation of and claims the benefitunder 35 U.S.C. §120 of U.S. patent application Ser. No. 13/555,027,titled “PARALLEL PROCESSING SYSTEM,” filed Jul. 20, 2012, which isincorporated herein by reference in its entirety.

BACKGROUND

Commercial trading, such as trading in financial markets and trading offinancial products, typically takes one of two forms: exchange tradingand non-exchange trading.

Exchange trading occurs with the assistance of a public exchange, inwhich buyers and sellers openly advertise availability of trades and theprices at which the trades may be made. Because of the public nature ofexchanges, trades of the same items (e.g., the same stock for the samecompany) that occur at the same time typically occur for the same priceor roughly the same price. Stock exchanges like the New York StockExchange (NYSE), in which stocks are traded publicly and are availableat a publicly-advertised price, are an example of exchange.

Non-exchange trades, on the other hand, are not public and are notadvertised, but instead occur privately between two parties. In anon-exchange trade, one party may privately offer a trade to anotherparty and the trade may be executed when the other party accepts,without anyone else being notified of the trade, the item being traded,or the price. The private nature of the trades may lead to trades forthe same item at the same time being carried out at different priceswhen different pairs of parties are involved. In some cases, one sellermay offer the same item to different buyers at different prices at thesame time, because the privacy of the trading decreases the risk thatthe buyers will discover the different pricing. Similarly, buyers mayreceive offers for trades of the same item at the same time fromdifferent sellers with different prices. Non-exchange trades are alsocommonly known as over-the-counter (OTC) trades.

One example of OTC trading is foreign exchange trading, also called FXtrading or “forex” trading. In foreign exchange trading, one party mayoffer to another to trade one form of currency (e.g., one nation'scurrency) for another form of currency (e.g., another nation's currency)at a rate of exchange between the two currencies set by the seller. Manydifferent banks and other financial institutions engage in foreignexchange trading and the exchange rates for foreign exchange trading mayvary widely. A buying or selling party may set exchange rates for eachpair of currencies individually, without regard to whether there isconsistency in or equivalence between the parties' exchange rates formultiple sets of currencies.

The differences in pricing between parties for OTC trades may create anopportunity for profit through multiple trades of items to multipleparties at different prices. When these multiple trades for profit arecarried out in the foreign exchange market, this is known as “financialarbitrage.” Triangular arbitrage is a form of financial arbitrage inwhich a party trades between three different forms of currency, oftenwith multiple different parties, to realize a profit. FIG. 1 illustratesan example of a triangular arbitrage. In the arbitrage 100 of FIG. 1, afirst party begins with US$1 million and receives an offer fortransaction 102 from a second party indicating that the second partywill trade euros for the U.S. dollars at an exchange rate of 1.35225USD/EUR. When the first party carries out this transaction 102, thefirst party possesses

739,508.23. The first party may then receive another offer for atransaction 104 from a third party indicating that the third party willtrade British pounds for euros at exchange rate of 0.68211 GBP/EUR. Whenthe first party carries out this transaction 104, the first partypossesses £504,425.96. The first party may then receive another offerfor a transaction 106 from a fourth party indicating that the fourthparty will trade U.S. dollars for British pounds at an exchange rate of2.00 USD/GBP. When the first party carries out this last transaction106, the first party again possesses U.S. dollars, but hasUS$1,008,851.91 following the series of trades, where the first partyoriginally had US$1,000,000, resulting in a net profit from thearbitrage of US$8,851.91.

Profit from arbitrage is possible in part because of the differences inexchange rates for currencies between parties that accompanies OTCtrading. In exchange markets, the prices for transactions are similarbetween parties at a given time, as discussed above. In an OTC market,in contrast, while exchange rates across the market may be generallyconsistent, small variations in prices that are established by partiescreate the potential for profits and create the potential for largeprofits when the volume of a trade (e.g., the amount of currencyexchanged) is large.

SUMMARY

In one embodiment, there is provided a method comprising generating,using at least one processor, a plurality of processing chains forparallel execution on at least one processing unit comprising aplurality of processing cores. The generating comprises generating eachof the plurality of processing chains according to a template processingchain and a specification of data to be processed by the plurality ofprocessing chains. The template processing chain comprises a pluralityof software modules in which at least one software module receives oneor more inputs that are one or more outputs of one or more othersoftware modules of the plurality of software modules, and thespecification of data defines a plurality of inputs to be processed bythe plurality of processing chains. The generating comprises generatingthe plurality of processing chains such that each processing chain ofthe plurality of processing chains is adapted to cause the at least oneprocessing unit to perform operations defined by the plurality ofsoftware modules on at least a portion of the data defined by thespecification of data. The method further comprises selecting aconfiguration for the at least one processing unit for executing theplurality of processing chains on the plurality of processing cores. Theselecting is carried out based at least in part on an executionefficiency of the plurality of processing chains when the at least oneprocessing unit is configured according to the configuration. The methodfurther comprises producing configuration information for configuringthe at least one processing unit according to the configuration.

In another embodiment, there is provided a method comprising evaluating,using at least one processor, a plurality of software modules forexecution on at least one processing unit comprising a plurality ofprocessing cores. A first portion of the plurality of software modulesreceive as inputs outputs generated by a second portion of the pluralityof software modules. The evaluating comprises evaluating the pluralityof software modules to identify at least one change to the plurality ofsoftware modules to increase an execution efficiency of the plurality ofsoftware modules when executed in parallel on the at least oneprocessing unit comprising the plurality of processing cores. The methodfurther comprises, in response to the evaluating, automatically editing,using the at least one processor, the plurality of software modules toimplement the at least one change to increase the execution efficiencyof the plurality of software modules.

In a further embodiment, there is provided a method comprisingevaluating, using at least one processor, operations of a plurality ofsoftware modules to be executed in parallel on a plurality of processingcores, evaluating, using at least one processor, characteristics of datato be processed by the plurality of software modules, and, based atleast in part on the evaluation of the operations and the evaluation ofthe characteristics, producing configuration information for configuringthe plurality of processing cores to execute the plurality of softwaremodules. Producing the configuration information comprises producingconfiguration information based at least in part on differences betweena first type of processing core of the plurality of processing cores anda second type of processing core of the plurality of processing cores.

The foregoing is a non-limiting summary of the invention, which isdefined by the attached claims.

BRIEF DESCRIPTION OF DRAWINGS

The accompanying drawings are not intended to be drawn to scale. In thedrawings, each identical or nearly identical component that isillustrated in various figures is represented by a like numeral. Forpurposes of clarity, not every component may be labeled in everydrawing. In the drawings:

FIG. 1 is a depiction of a sequence of non-exchange trades carrying outa triangular arbitration;

FIG. 2 is a block diagram of a computer system in which some embodimentsmay operate;

FIG. 3 is a block diagram of computing device with which someembodiments may operate;

FIG. 4A is a block diagram of a software development environment inwhich some embodiments may operate;

FIG. 4B is a flowchart of a software development process that may beperformed in some embodiments;

FIG. 5 is a flowchart of an exemplary process for developing andexecuting software modules for financial arbitrage;

FIG. 6 is a flowchart of an exemplary process for developing softwaremodules that may be performed in some embodiments;

FIG. 7A is a flowchart of an exemplary process for financial arbitragethat may be performed in some embodiments;

FIG. 7B is a block diagram of processing steps of the exemplary processof FIG. 7A;

FIG. 7C is a block diagram of software modules that may be implementedto carry out operations corresponding to the processing steps of theexemplary process of FIG. 7A;

FIG. 8 is a flowchart of an exemplary process for selecting aconfiguration for efficiently executing software modules on multicoreprocessing units that may be performed in some embodiments;

FIG. 9 is a flowchart of an exemplary process for evaluating softwaremodules to determine a configuration for efficiently executing thesoftware modules that may be performed in some embodiments;

FIG. 10 is a flowchart of an exemplary process for configuring one ormore processing units to execute software modules;

FIG. 11 is a flowchart of an exemplary process for iterativelyconfiguring one or more processing units to execute software modules;

FIG. 12 is a flowchart of an exemplary process for operating one or moreprocessing units to execute software modules of a plurality ofprocessing chains to implement a process for solving a complex problem;

FIG. 13 is a flowchart of an exemplary process for operating one or moreprocessing units to execute software modules to carry out a tradingsystem; and

FIG. 14 is a block diagram of a computing device with which someembodiments may operate.

DETAILED DESCRIPTION

Applicants have recognized and appreciated that though high-frequencytrading (HFT) techniques have been developed for exchange-based trades,HFT techniques for OTC trading are lacking. Moreover, Applicants haverecognized and appreciated that by providing a framework for programminglow-cost hardware, including consumer-grade generally-programmablegraphics processing units (GPGPUs) and/or other processing units thatinclude multiple processing cores, high-frequency trading for OTCmarkets can be enabled both quickly and inexpensively. Such a frameworkmay also enable programming low-cost hardware, such as consumer-gradeGPGPUs to perform processing for producing solutions to complex problemsusing parallel execution on the processing units.

High-frequency trading (HFT) techniques have been developed for tradingof items in exchange markets. HFT allows computing devices, configuredwith trading systems, to process streams of data regarding prices in theexchange and make trading decisions based on those prices. Tradingsystems for HFT in exchange markets can be complex due to the breadth ofitems traded in an exchange market (e.g., the number of stocks offeredin an exchange). In exchange markets, though, the types of trades thatcan be executed, the prices of the trades, or the sources of data arerelatively consistent. The nature of exchanges makes the price for eachitem consistent between parties, and all prices for items in a givenexchange can be derived from one common source. Additionally, the tradesin exchanges are primarily cash exchanges, which means that an item isassociated with one price (i.e., the cost of the item in cash in onecurrency). Because there is only one type of trade (the cash trade), atrading system for HFT would not have to consider multiple differentprices for an item, each associated with a different type of trade.Trading systems for HFT in exchanges therefore may not have to accountfor types of trades, variations in prices of trades, or sources of data.

OTC trading markets, however, may not have only one type of trade for anitem (i.e., may not have only one price for an item), may not haveconsistent pricing for those items between parties, and may not haveprices that can be derived from one common source. Rather, as discussedabove, each party that may make a trade in an OTC market may have adifferent price for an item when traded for multiple other items (e.g.,one price for a currency traded with a first currency, and a differentprice for that currency traded with the second currency, where the twoprices are not consistent), and may even have different prices for thosetrades between counterparties. Additionally, the prices set by a partyfor trades may be private and may not be obtainable from anywhere otherthan directly from that party. Still further, prices in OTC trading canbe changed by parties widely and quickly, with some prices being knownto change multiple times, even hundreds or thousands of times, persecond. OTC trading can be far more complex than exchange trading as aresult of these extra variables introduced in OTC trading.

Applicants have recognized and appreciated that the complexity of OTCtrading has hindered the development of high-frequency trading in theOTC markets. Additionally, this complexity has limited the types oftrading conducted in OTC markets. For example, OTC markets enableprofits to be made through sequences of trades, such as in the case oftriangular arbitrage discussed above in connection with FIG. 1.Identifying a potential profit in a sequence of trades includesanalyzing the possible trades that can be made and determining whichsequence of those trades would result in a profit. The complexity of OTCtrading and the number of variables to be considered have limited thisanalysis. Traditionally, three transactions, such as in the case oftriangular arbitrage, was the maximum number of transactions that couldbe considered.

Applicants have further recognized and appreciated that the complexityand number of variables to be considered for OTC trading have thereforetraditionally prevented the development of trading systems for using HFTtechniques to execute OTC trades. The complexity of any possible HFTsystems for executing OTC trades has meant that such systems would havehad to be executed on complex, and costly, hardware. For example, atrading system for HFT trading in OTC markets might have been executedon a large, distributed system of interconnected computers to enable theHFT techniques to be executed. However, programming such hardware is adifficult task, and can be costly. Creating software code that is ableto execute on such a platform is difficult and requires specializedtraining. Additionally, distributed systems are disadvantageous in theOTC market because of the latency of operations for such systems. HFTsystems rely on overall execution speed to make trades and realizeprofits. Delay in execution can result in a missed opportunity for atrade and a missed profit. Distributed systems may require a largeamount of space and power for operation and are therefore oftenimplemented on computers or devices that are distant from computers thatdo not form a part of the distributed system, and are often connected tothese other computers via the Internet. Because of the location,transferring data to and from the computers of a distributed system mayrequire an amount of time that is unacceptably large for trading systemsusing HFT techniques, particularly for OTC trades. The time necessary totransfer trading information from sources of trading information to thedistributed system and the time necessary to transfer outputs of thedistributed system back to a system for effecting a trade would be toolarge for trades to be made in OTC markets. Thus, even if suchdistributed systems could have been programmed to implement a tradingsystem using HFT techniques in OTC markets, the various disadvantages ofsuch systems for operation in OTC markets prevented their use in HFTsystems.

Applicants have recognized and appreciated the advantages of enablingtrading systems for using HFT techniques in OTC markets to be executedusing low-cost, consumer-grade hardware. Consumer-grade hardware such asmulticore processing units, such as central processing units (CPUs) andgenerally-programmable graphics processing units (GPGPUs), arerelatively low cost and may be able to execute complex operations inparallel using the multiple cores. Moreover, such an implementationallows computing devices that identify desirable trades to beimplemented in a compact way. As a result, a system for high frequencytrading may be installed in one or a small number of physical devicesclose to a source of trading information. Applicants have alsorecognized and appreciated that OTC trading is not the only type ofcomplex problem that may benefit from being executed efficiently inparallel on low-cost, consumer grade hardware. Many other systems inother contexts, including contexts other than OTC trading and other thanfinancial markets, could benefit from being executed in parallel onlow-cost, consumer-grade hardware.

However, consumer-grade hardware can be difficult to program forefficient execution of operations for complex problems. Applicants haverecognized that configuring GPGPUs (or other multicore processing units)to execute operations quickly and efficiently has conventionally beenperformed in an ad hoc way by individual programmers with a deepunderstanding of the manner in which GPGPUs operate and the manner inwhich the GPGPUs will process the precise instructions that are to beexecuted by the GPGPUs. Applicants have also recognized that manyprogrammers do not have this deep understanding of multicore processingunits and are not capable of configuring processing units to executeoperations quickly and efficiently.

In view of the foregoing, Applicants have recognized the advantages ofsoftware development tools that evaluate software modules developed forperforming operations for complex problems on multicore processingunits. Such software development tools may identify, based on theevaluation, configurations for the multicore processing units that willenable operations for the complex problems to be performed quickly andefficiently on the multicore processing units. Applicants haveadditionally recognized and appreciated the advantages of making suchsoftware development tools generic to particular types of operations tobe carried out on target hardware to perform the complex problems. Asmentioned above, configuring multicore processing units to executeoperations quickly and efficiently often requires a deep understandingof the precise instructions that are to be executed by the multicoreprocessing units and the manner in which those instructions will beexecuted, as well as hardware characteristics of the multicoreprocessing units. For example, configuring a multicore processing unitto execute operations quickly and efficiently may require knowledge ofhow one or more types of memory access operation are executed, such asthe latency of an operation, or knowledge of memory to be accessed, suchas the size of a cache or data transfer speeds for a memory. Therefore,it may be beneficial for software development tools to correspond tospecific complex problems and evaluate operations based on informationabout a specific complex problem to which the operations relate whenproducing configuration information for multicore processing units.Applicants have recognized, however, that software development tools canbe created that are generic to the complex problems to be performed andthat are able to evaluate operations that relate to many differentcomplex problems to produce configuration information. In particular,Applicants have recognized and appreciated that by configuring softwaredevelopment tools to evaluate characteristics of software modules inparticular ways, such software development tools can produceconfiguration information for configuring multicore processing unit toquickly and efficiently execute software modules regardless of theproblem or domain to which the software modules relate.

Accordingly, described herein are software development tools andtechniques for creating software modules to implement processes forsolving complex problems, including OTC trading processes such asforeign exchange trading processes. In some embodiments, a process forsolving a complex problem may be divided into software modules in anysuitable manner, including by evaluating the process to determinediscrete processing steps that are repeated in the process and thatproduce an intermediate result of the process on which later steps ofthe process depend. Such modules may be executable instructions that arearrange in any suitable way. In some cases, the modules may be logicalunits. A logical unit may be, for example, a logical portion of a largerlogical unit or a distinct logical unit. A logical portion of a largerlogical unit may be, for example, a function contained within a file,and a distinct logical unit may be, for example, a file.

Software modules created from identifying discrete processing steps of aprocess may correspond to repeated operations in the process.Accordingly, in some embodiments, software modules that are created fora process may be formed as a template processing chain. The templateprocessing chain may describe processing chains that form a part of theprocess and that are repeatedly executed in parallel, on different data,to implement the process. Multiple different processing chains may becreated from the template processing chain by replicating modules of thetemplate processing chain for the multiple processing chains. Themultiple processing chains may be mapped to processing cores of one ormore low-cost, consumer-grade multicore processing units to be executedon the cores and implement the process. Types of and sources of data tobe provided as input to each processing chain may also be specified. Asoftware development tool for producing configuration information forconfiguring multicore processing units to efficiently execute thesoftware modules may evaluate the software modules and the data to beprocessed to identify the configuration information.

In some embodiments, a software development tool may generate multipleprocessing chains, each corresponding to some of the types and sourcesof data to be provided as input. In such embodiments, generating theplurality of processing chains may include replicating template softwaremodules of the template processing chain to produce multiple processingchains each including the software modules. When the software modulesare replicated, the software development tool may also identify sourcesof input data for the software modules and destinations of output datafor the software modules. Identifying sources of input data may includeidentifying software modules that accept as input output generated byother software modules. The software development tool may then evaluatethe software modules of the multiple processing chains to determine aconfiguration of the multicore processing unit(s) that will enableefficient execution of the plurality of processing chains. In evaluatingexecution efficiency, the software development tool may considerdifferences between cores of processing units on which the softwaremodules may execute. Differences between cores may include differencesin capabilities and in configurations of cores, as well as differencesin the manner in which cores may execute instructions. Cores for whichdifferences are evaluated may include cores of the same processing unit.By considering the differences between cores, the software developmenttool may be able to account for these differences in configuring thesoftware modules and/or the cores. By accounting for these differencesin the configuration, the software development tool may be able to takeadvantage of differences between cores in a way that increases speedand/or efficiency of execution of the software modules.

The evaluation for speed and efficiency may be carried out to ensurethat the overall set of software modules executes quickly andefficiently, or to ensure that each individual software module executesquickly and efficiently, or based on any other suitable factors relatingto efficiency. Execution efficiency of software modules may, in someembodiments, be determined based at least in part on efficiency of timespent executing the software modules. Efficiency of time spent executingthe modules may be evaluated based at least in part on an amount of timeprocessing cores spend executing instructions for software modulesand/or an amount of time processing cores spend not executinginstructions for the software modules. In some embodiments, for example,the ratio of these amounts of time may be calculated in determiningexecution efficiency. Efficiency of execution of software modules mayadditionally or alternatively be evaluated in terms of power efficiencyof the software modules. Some instructions, when executed by one or moreprocessors, may cause the processor(s) to draw more power than whenother instructions are executed. In some cases, the difference in powerconsumption of the processors when the different instructions areexecuted may be slight. However, when the instructions are executed manytimes, such as many times by one processor or many times across manyprocessors, the difference in power consumption may not be negligibleand may be significant. Reducing power consumption may reduce costs ofexecuting the instructions. Thus, in some embodiments, efficiency ofpower consumption may be evaluated in determining the efficiency ofexecution of the software modules. The evaluation of efficiency may bebased on any suitable characteristics of software modules, examples ofwhich are discussed in detail below.

Based at least in part on the evaluation of the software modules, thesoftware development tool may produce configuration information for themulticore processing unit(s). The configuration information may includeinformation to configure hardware and/or software of a multicoreprocessing unit to execute the software modules. The information toconfigure hardware and/or software may include information to configurea management facility that interacts with hardware of a multicoreprocessing unit. The management facility may be executed by one or moreprocessing units separate from the unit to which the management facilityrelates. The configuration information may also include information toconfigure hardware of a multicore processing unit, such as informationthat is to be written to registers of a multicore processing unit andused by the hardware and/or firmware integrated with the hardware.Configuration information may include any suitable information.

In some embodiments, the configuration information may arrange themulticore processing unit(s) according to one configuration out ofmultiple different available configurations. The configurationinformation may include instructions to be executed by one or more coresto implement the software modules. The instructions to be executed byone or more cores may include software modules arranged in anintermediate language. The intermediate language may be one that is notexecutable on a processing core of a multicore processing unit on whichthe software modules are to execute. A management facility of themulticore processing unit may be configured to interpret instructions inthe intermediate language to create sets of instructions for executionon cores of the processing unit. Configuration information may includeinformation to configure the management facility to perform theinterpretation in a particular manner, such as by preferring particulartypes of instructions over other types of instructions when interpretingthe intermediate language and creating sets of instructions. Theconfiguration information may additionally or alternatively identify aninterrelationship (e.g., dependency) between software modules and theinputs and outputs of software modules, processing cores of themulticore processing unit(s) to which software modules should beassigned for execution, a relative time at which software modules shouldexecute, and/or any other suitable information.

Once sets of instructions for the software modules are produced based onthe intermediate language and the configuration information, theinstructions and the configuration information may be used to configurea multicore processing unit. The multicore processing unit(s) may beconfigured to execute the multiple processing chains based on the setsof instructions and the configuration information. The multicoreprocessing unit(s) may then be operated to execute instructions for thesoftware modules for the multiple processing chains to perform theprocess to which the software modules and multiple processing chainsrelate.

In some embodiments, software modules that, when executed by one or moremulticore processing units cause the processing units to implement anOTC trading system, such as a foreign exchange trading system, can beevaluated by such a software development tool. One or more multicoreprocessing units may therefore be configured to implement an OTC tradingsystem by executing multiple software modules in parallel on theprocessing cores of the multicore processing unit(s). Examples of waysin which a multicore processing unit can be configured to efficientlyexecute software modules to implement an OTC trading system arediscussed below.

Various examples of ways in which techniques described herein may beimplemented are described below. It should be appreciated, however, thatembodiments are not limited to operating according to any of theseexemplary techniques and that other embodiments are possible.

For example, in various embodiments described above and below, softwaremodules may be described as relating to foreign exchange trading.Embodiments are not, however, limited to operating in the foreignexchange trading context or in any OTC trading context, but rather mayoperate to produce solutions to complex problems in any suitablecontext. Techniques described herein may be useful in producingsolutions to problems in which multiple possible options are to beevaluated quickly and one or more of the options is to be selected basedon one or more criteria. Such problems may include those in which dataof multiple different types or from multiple different sources are to beevaluated using the same or similar processes. A problem for which thesame or similar processes are to be carried out on data may be wellsuited for the parallel execution and automated creation of softwaremodules as described herein.

As an example of another problem to which the techniques describedherein may be applied, in some embodiments software modules may relateto a Black-Scholes process for determining leverage and risk offinancial investments. As another example, in other embodiments,software modules may relate to a price cleansing process for determiningwhether advertised prices for trades are outliers and potentiallyerroneous. Outside of the financial industry, embodiments may relate toperforming navigational routing, including rerouting upon encountering acondition such as traffic on an originally-chosen route. Any suitablecomplex problem may be analyzed and processed using techniques describedherein.

Additionally, the software modules may be described in examples below asoperating on low-cost, consumer-grade hardware such as agenerally-programmable graphics processing unit (GPGPU) having multipleprocessing cores, such as hundreds of processing cores. However, itshould be appreciated that embodiments are not limited to operating withGPGPUs or any other form of graphics processing unit (GPU), as otherhardware types are possible. For example, in some embodiments, a centralprocessing unit (CPU) having multiple cores may be used, while in otherembodiments a combination of one or more CPUs having multiple cores andone or more GPGPUs having multiple cores may be used. In still otherembodiments, other types of processing units that have or can bearranged as multiple processing cores, such as one or moreField-Programmable Gate Arrays (FPGAs) arranged to include multipleprocessing cores or Application Specific Integrated Circuits (ASICs)that include multiple processing cores, may be used alone or incombination with a CPU and/or a GPU. Embodiments are not limited tooperating with any particular form of hardware.

FIG. 2 illustrates an example of a computer system in which someembodiments may operate. The computer system 200 illustrated in FIG. 2is an example of a financial trading environment in which a tradingsystem operating according to techniques described herein may carry outOTC trades, including foreign exchange trades.

The computer system 200 includes multiple different components of afinancial trading environment, including computing devices and sourcesof data operated by multiple different parties to financial trades. Thecomputer system 200, as illustrated in FIG. 2, includes multiple sourcesof data 202A, 202B, 202C. Each of the sources of data 202A, 202B, 202Cmay be a source of trading information and may be implemented in anysuitable manner as any suitable source of data. In some embodiments, forexample, the sources of data may be computing devices operated bytrading parties that execute automated processes for determining tradinginformation for trades to be executed by the devices on behalf of thetrading parties. In other embodiments, the sources of data may bedatabase servers or other computing devices that may communicate tradinginformation, as embodiments are not limited to operating with anyparticular type of data source.

The trading information available from the sources of data may includeany suitable information about financial trades that may be carried outin the financial trading environment. For example, the tradinginformation may identify, for a trading party that is advertising apotential financial trade, buy and/or sell prices for trades that thetrading party is willing to make, and may also include a volume of atrade that the party is willing to execute at that price. In addition toprice and volume, the trading information may include informationidentifying the trade, such as information identifying items to betraded. For example, for a trade of one currency for another, the twocurrencies to be traded may be identified.

Each of the sources of data 202A, 202B, 202C may be a source of tradinginformation for an entity that is a party to potential trades. Forexample, source of data 202A may be a source of trading information forone bank, source of data 202B may be a source of trading information foranother bank, and source of data 202C may be a source of informationcompiled by an aggregator of trading information that includes pricesfrom multiple other potential parties. While three sources of data areillustrated in the example of FIG. 2, it should be appreciated thatembodiments are not limited to operating with any particular number ofsources of data.

Trading information provided by the sources of data 202A, 202B, 202C maybe received by a bridge 206, which may be any application executing onany suitable computing device able to receive and process tradinginformation. In some embodiments, the bridge 206 may be a computingdevice, dedicated to operating as a bridge, that is configured withhardware and/or software to operate as a liquidity bridge to carry outforeign exchange trading operations. When implemented as a liquiditybridge, the bridge 206 may operate according to any suitable techniques,including known techniques, for operating a liquidity bridge. In otherembodiments, the bridge 206 may be a software program executing on aprocessing unit of a device. The bridge 206, when implemented as aprogram, may be executed on any suitable device to execute trades,including device 208.

Bridge 206 may process data received from the multiple different sourcesof data in any suitable manner. For example, the bridge 206 mayaggregate the trading information received from the multiple differentsources of data and store the trading information in one location to belater retrieved for analysis. As another example, the bridge 206 mayreformat trading information received from each of the multiple sourcesof data, such as in the case where trading information is received fromdifferent sources of data in different formats. To aid in subsequentreview and analysis of trading information received from the multipledifferent sources of data, the bridge may reformat trading informationreceived from the multiple sources of data, such that the tradinginformation is in one consistent format. In embodiments in which thebridge 206 reformats the data, the bridge 206 may reformat the data inany suitable manner and store the data in any suitable format, asembodiments are not limited in this respect.

The bridge 206 may be communicatively connected in any suitable mannerto each of the sources of data 202A, 202B, 202C to receive the tradinginformation. FIG. 2 illustrates the bridge 206 connected to the sourcesof data via a communication network 204. The communication network 204may include any suitable wired and/or wireless communication medium. Insome embodiments, the communication network 204 may include multipledirect fiber-optic connections between the bridge 206 and each of thesources of data 202A, 202B, 202C, such that the bridge 206 has a directand dedicated connection to each of the sources of data. A fiber opticcable may be used in some embodiments in which the bridge 206 isco-located with the sources of data 202A, 202B, 202C, such as by beinglocated in the same data room. In other embodiments, however, thecommunication network 204 may include one or more local and/or wide-areanetworks, including an enterprise network and/or the Internet.Embodiments are not limited to operating with any particular type ofconnection between the bridge 206 and the sources of data 202A, 202B,202C.

In addition to receiving and processing trading information, the bridge206 may also transmit communications, on behalf of an operator of thebridge 206, to execute potential trades identified by the tradinginformation received from the sources of data. Executing the potentialtrade may include attempting to complete a trade and/or completing atrade. The operator of the bridge 206 may be any suitable entity,including an owner of the bridge 206, a subscriber to a service withwhich the bridge 206 is connected, a human user of the bridge 206, orany other entity on behalf of which trades may be executed. The bridge206 may communicate with any suitable destination to execute a tradeidentified by the trading information, including by communicating to oneor more of the sources of data. The bridge 206 may transmit any suitablecommunication to the destination to execute a trade, including usingknown communications and known techniques for automatically executingtrades, as embodiments are not limited in this respect.

The bridge 206 may execute a trade on behalf of an operator of thebridge 206 in response to any suitable instruction identifying a tradeto be executed. For example, the bridge 206 may receive instruction froma human user to execute a trade and, in response to the instruction fromthe human user, communicate to a destination to execute the tradeinstructed by the human user. Additionally or alternatively, the bridge206 may receive instruction to carry out a trade from an automatedtrading system that is analyzing the trading information received fromthe sources of data and identifying desirable trades. The automatedtrading system may identify desirable trades based on any suitablecriteria, including by determining potential trades identified by thetraining data that have the highest potential for profit. In accordancewith techniques described herein, the trading system may be implementedas a collection of software modules executing in parallel on low-cost,consumer-grade multicore processing units.

As discussed above, trading information for OTC trades, includingforeign exchange trades, may be complex and contain multiple differentvariables, each of which may be changing quickly. For example, thebridge 206 may receive trading information from the source of data 202Amultiple times per second, including hundreds or thousands of times persecond. In addition, the trading information received from the source ofdata 202A may include multiple different prices for multiple differentpotential trades, each of which may be changing each time the tradinginformation is received from the source of data 202A. Similar tradinginformation may be received at a similar rate from each of the othersources of data. In addition, a trading party that releases the tradinginformation, such as a trading party that operates one of the sources ofdata, may only honor prices for trades identified by trading informationfor a relatively small window of time. For example, once tradinginformation for a potential trade, such as the price associated with thepotential trade, changes, the trading party may stop honoring previoustrading information immediately or after a short time. After that time,the trading party may decline to execute the trade identified by thetrading information. Analyzing the multiple different, rapidly-changingpieces of data in trading information quickly enough to ensure that adesirable trade can be identified and executed while a potentialcounterparty to a potential trade will still approve and complete thetrade is a complex process. In addition, analyzing trading informationto identify a sequence of multiple trades, such as a sequence of tradesthat may be carried out for financial arbitrage (described above inconnection with FIG. 1), quickly enough for each of the trades in anidentified sequence to be approved and completed by the other parties tothose trades is a complex process.

Some trading systems may add to this complexity by considering otherfactors in determining whether particular trades are desirable. Forexample, in some embodiments, trading systems may consider other factorsin addition to an anticipated profit associated with a potential trade,such as factors relating to a likelihood of a counterparty to apotential trade approving and completing the trade. Considering alikelihood of a counterparty to a potential trade approving thepotential trade and the potential trade being executed and completed maybe advantageous because the trading system may identify that a potentialtrade that is not likely to be approved is not desirable, and thereforenot attempt to execute the potential trade even if the potential trademay be profitable. The trading system may therefore, in some cases,attempt to avoid spending time attempting to execute a potential tradethat is ultimately not approved by a counterparty. The system mayinstead attempt to execute one or more trades that are more likely to beapproved, even if these trades have a lower anticipated profit than sometrades that are unlikely to be completed.

Therefore, in some embodiments, a trading system may consider, for atrade with a counterparty identified by trading information receivedfrom the counterparty, in addition to price, factors that may indicate alikelihood of a potential trade being approved by a counterparty. Suchfactors that are indicative of whether a trade may be approved mayinclude a number of trades recently executed by the operator of thebridge 206 with the counterparty. Such recent trades may be tradesexecuted by the operator with the counterparty within a past amount oftime, such as the past minute, the past five minutes, the past day, orany other suitable unit of time. The number of recent trades may beconsidered by a trading system because some counterparties may monitorthis number and deny trades with the operator when the number is toohigh. A trading system may therefore consider a number of recent tradeswhen determining a likelihood of a trade being executed. Additionally oralternatively, a trading system may consider, when determininglikelihood, a number of potential trades that the trading systemidentified as desirable and attempted to execute, but that were deniedby the counterparty to the trade. By considering the number of recenttrades that were denied by a counterparty, the trading system canaccount for a trading party that has been recently denying trades andattempt to avoid trades with that party. The system may, for example,adjust the likelihood of a potential trade being approved to indicatethat the new potential trade is less likely to be executed when thetrading system detects that the counterparty to that new potential tradehas recently been denying trades. Any other suitable factors may beconsidered by a trading system in determining a likelihood of apotential trade being approved by counterparties and executed, asembodiments are not limited to evaluating any particular factors whendetermining whether a potential trade is desirable, including whether apotential trade is likely to be executed.

Accordingly, trading systems that evaluate trading information receivedby the bridge 206 to identify desirable trades and instruct the bridge206 to execute trades identified as desirable may need to quicklyexecute complex processes for evaluating the trading information. Asdiscussed above, Applicants have recognized and appreciated thatperforming such complex processes quickly on low-cost hardware may beenabled using techniques described herein.

The computer system 200 of FIG. 2 illustrates a computing device 208 incommunication with the bridge 206 that may evaluate trading informationreceived by the bridge 206, identify trades to be executed by the bridge206, and instruct the bridge 206 to execute the trades. The computingdevice 208 includes one or more multicore processing units on which atrading system may execute to perform the evaluation, identification,and instruction of the trading system.

The computing device 208 may be implemented in any suitable mannerhaving any suitable form factor. In some embodiments, for example, thecomputing device 208 may be implemented as a desktop or laptop personalcomputer. In other embodiments, the computing device 208 may beimplemented as a rack-mounted server or multiple rack-mounted serversthat are connected together in a manner that provides low latency forcommunications between the rack-mounted servers. The computing device208 includes one or more multicore processing units to executeoperations of a trading system in parallel. The multicore processingunits may be low-cost multicore processing units, includingconsumer-grade multicore processing units.

In the example of FIG. 2, the multicore processing units of thecomputing device 208 include one or more central processing units 210and one or more generally-programmable graphics processing units 212.The central processing units 210 include multiple processing cores 210A,each of which can be operated individually and in parallel with otherprocessing cores of the central processing units 210 to executeinstructions of the trading system. The graphics processing units 212also include multiple processing cores 212A, which also can be operatedindividually and in parallel with one another to execute instructions ofa trading system.

The computing device 208 additionally includes one or more storage media214 to store instructions for execution on the multicore processingunits and to store data to be processed by the multicore processingunits. As illustrated in FIG. 2, the storage media 214 stores an inputfacility 216, a trading system 218 including software modules, amanagement facility 220 that includes a scheduling facility 220A and aninterpretation facility 220B, and trading information 222.

The input facility 216, when executed by one or more of the multicoreprocessing units, may communicate with the bridge 206, receive tradinginformation from the bridge 206 and store the trading information astrading information 222 in the storage media 214.

The trading system 218 includes multiple different software modules,such as tens, hundreds, or thousands of software modules, that may beexecuted in parallel on different processing cores of the multicoreprocessing units 210, 212 of the computing device 208. When the softwaremodules of the trading system 218 are executed in parallel on theprocessing cores of the multicore processing units 210, 212,instructions of each of the software modules that correspond to aportion of the trading system are executed. By executing in parallel onthe processing cores of the multicore processing units 210, 212, thesoftware modules of the trading system 218 can execute quickly andefficiently to perform operations of the trading system 218.

The management facility 220 may manage a multicore processing unit, suchas by managing the graphics processing unit 212. The management facility220 may manage the graphics processing unit 212 by managing interactionsbetween hardware of the graphics processing unit 212 and othercomponents of the device 208. In some embodiments, the managementfacility 220 may be a device driver for the unit 212 or may performoperations of a device driver for the unit 212. The management facilitymay accept configuration information for configuring the graphicsprocessing unit 212 and may carry out operations to configure the unit212 based on the configuration information. As part of configuring theunit 212 based on the configuration information, a scheduling facility220A and an interpretation facility 220B of the management facility 220may carry out configuration operations. As discussed in greater detailbelow, the scheduling facility 220A may schedule software modules forexecution on cores 212A of the graphics processing unit 212 according toscheduling constraint information contained within the configurationinformation. Also as discussed in greater detail below, in embodimentsin which software modules of the trading system 218 are not formatted ina way that is executable by the cores 212A, the interpretation facility220B may reformat the software modules for execution. The interpretationfacility 220B may reformat the software modules in any suitable manner.In some embodiments, the interpretation facility 220B may examineoperations of the software modules and create sets of instructions thatcan be executed by the cores 212A. For example, in some embodiments thesoftware modules of the trading system 218, when provided to themanagement facility 220 for execution by the cores 212A, may includeinstructions formatted in an intermediate language that the cores 212Acannot execute. The interpretation facility 220B may interpret theintermediate language and create, for each module, sets of instructionsthat can be executed by the cores 212A and that correspond to theoperations of the software module.

As discussed briefly above and in detail below, Applicants haverecognized and appreciated that executing the complex operations of atrading system (as well as complex operations outside of the financialindustry, in other domains) on low-cost hardware can be enabled throughthe use of a software development tool for increasing executionefficiency of software modules. Such a software tool may evaluatesoftware modules of a trading system, may automatically edit thesoftware modules based on the evaluation, and may produce configurationinformation for multicore processing units based on the evaluation ofthe software modules. The software development tool may evaluate anyother suitable information along with the software modules, includinginformation regarding target hardware on which the modules are to beexecuted. For example, differences between cores of a multicoreprocessing unit that is to execute software modules may be evaluated bythe software development tool. The software development tool mayautomatically edit the modules themselves or edit a collection ofinterconnected modules to change the manner in which the modulescommunicate with one another, to add software modules to the collection,and/or to remove software modules from the collection. Configurationinformation produced by the software development tool may includeinformation that may be provided to and processed by the managementfacility 220. The configuration information may, in some embodiments,include information that may be used to configure the schedulingfacility 220A to schedule software modules for execution on cores 212Ain a particular manner. In embodiments in which the interpretationfacility 220B interprets software modules to produce sets ofinstructions for execution on cores, the configuration information mayadditionally or alternatively include information to configure theinterpretation facility 220B. The information to configure theinterpretation facility 220B may include information that affects amanner in which the interpretation is carried out and which instructionsare output based on the interpretation. For example, the configurationinformation may configure the interpretation facility 220B to preferparticular types of instructions.

Illustrative techniques for operating such a software development toolare discussed in detail below. In the example of FIG. 2, the softwaremodules of the trading system 218 are evaluated and modified by such asoftware development tool. In addition, the software development toolproduces configuration information used by a scheduling facility 220B.The scheduling facility 220B may be a portion of a management facility220 for one or more of the multicore processing units 210, 212 and maybe responsible for assigning software modules for execution in parallelon the processing cores of the one or more of the multicore processingunits 210, 212. In cases in which the number of software modules of thetrading system 218 is greater than the number of processing cores of themulticore processing units, the scheduling facility 220B may beresponsible for scheduling the software modules for execution atdifferent times. In addition, the scheduling facility 220 may assignparticular software modules to particular processing cores based on theconfiguration information produced by the software development tool.

In FIG. 2, the computing device 208 is illustrated as connected to acomputing device 224. In the computing system 200 of FIG. 2, thesoftware development tool that evaluates software modules of the tradingsystem 218 and produces configuration information may be executed on thecomputing device 224. The software development tool may be implementedas one or more functional facilities that may execute on one or moreprocessing units (including multicore processing units) of the computerdevice 224 to perform the functions of the software development tooldescribed herein. The computing device 224, upon evaluating softwaremodules of the trading system 218, may configure the computing device208 based on the evaluation conducted by the software development tool.Configuring the computing device 208 may be carried out in any suitablemanner. The configuration may include, for example, storing the softwaremodules of the trading system 218 that may have been modified by thesoftware development tool, setting configuration parameters of hardwareof the computing device 208 (including the multicore processing units),and providing the configuration information to the management facility220. The management facility 220 may then, based on the configurationinformation, interpret the modules to produce sets of instructions forthe modules and schedule the instructions for the modules for executionon one or more cores 212A. Examples of the types of configuration thatmay be carried out by software development tool are discussed in greaterdetail below.

While FIG. 2 illustrates the computer system 200 as including one bridge206, one computing device 208, and one computing device 224, it shouldbe appreciated that embodiments are not limited to operating with anyparticular number of bridges 206, computing devices 208, and computingdevices 224. In some embodiments, for example, the bridge 206, computingdevice 208, and/or computing device 224 may be implemented as a systemof multiple devices operating together to, for example, balance a loadon the bridges 206, computer devices 208, and/or computing devices 224.Additionally, in other embodiments, bridge 206, computing device 208,and computing device 224 may be implemented as a single computing deviceexecuting the functionality of these devices described above.

Additionally, it should be appreciated that while the managementfacility 220 of FIG. 2 was discussed in connection with the graphicsprocessing unit 212, some embodiments may additionally or alternativelyinclude a management facility that manages the central processing unit210. A management facility for the central processing unit 210 maymanage the unit 210, including by scheduling modules for execution onthe unit 210. Further, it should be appreciated that while someembodiments, like the embodiment of FIG. 2, may include aninterpretation facility 220B that interprets software modules written inone language or formatted in one manner and produces sets ofinstructions for the modules that may be executed by cores, embodimentsare not limited in this respect. In other embodiments, software modulesevaluated by the software development tool and provided to a managementfacility 220 or to a multicore processing unit may include instructionsin a format that can be executed by cores of the multicore processingunit.

FIG. 3 illustrates the multicore processing devices 210, 212 and storagemedia 214 of the computing device 208 of FIG. 2 in greater detail. As inFIG. 2, the computing device 208 is illustrated in FIG. 3 as includingone or more central processing units 210, one or more graphicsprocessing units 212, and storage media that include shared memory 214that is shared between the central processing unit(s) 210 and graphicsprocessing unit(s) 212. Each of the processing units 210, 212 may readdata from the shared memory 214 and write data to the shared memory 214.Accordingly, the processing units 210, 212 may exchange data with oneanother by storing the data in shared memory 214. The central processingunit(s) 210 is also illustrated as including an on-chip cache 210B thatmay store data for processing by each of the processing cores 210A.Similarly, the graphics processing unit(s) 212 is illustrated asincluding an on-chip cache 212E that may store data for processing byeach of the processing cores of the graphics processing unit(s) 212. Insome embodiments, a multicore processing unit may additionally oralternatively include one or more other forms of storage not illustratedin FIG. 3. For example, each processing core 210A, 210B and eachprocessing core 212A-212D may include a local cache that may be used bysoftware modules that execute on the processing core. As anotherexample, each processing core may be assigned to a block of processingcores that share access to a storage, such as sharing access to aparticular memory or a particular region of memory. In some cases, thestorage to which the processing cores of the block share access may be ablock-shared cache that is accessible to processing cores of the block,but not accessible to other processing cores, or to which the processingcores of the block have preferred access such that the cores of theblock are given priority when requesting access to the block-sharedcache. Other forms of storage may also be included in a processing unit,as embodiments are not limited to operating with processing units thatinclude any particular forms of storage.

The processing cores of the multicore processing units 210, 212 mayinclude multiple different types of processing cores. Some of thesetypes of processing cores may be specially adapted to execute someinstructions or types of instructions. For example, one type ofprocessing core may include one or more components that permit cores ofthe type to execute some instructions in a manner that is different fromthe manner in which cores of other types may execute those instructions.The components may permit the cores of this type to execute theinstructions more quickly or more efficiently than cores of other types.Other types of cores may also include one or more components that permitthose cores of other types to execute other instructions in a differentmanner. Each type of core may include components that permit that typeof core to execute one or more instructions in a particular manner, suchas by executing the one or more instructions more quickly or efficientlythan other cores. The instructions a core is specially adapted toexecute may be any suitable one or more instructions. In some cases, theinstructions may be instructions of a particular type, such as memoryaccess instructions or logical instructions. Cores that are of a typethat is specially adapted to perform some instructions may be able toexecute other instructions, but may not be specially adapted to performthese other instructions and may execute the other instructions in amanner the same or similar to the way in which the other instructionsmay be executed by other cores not specially adapted to execute thoseother instructions. The components included in cores that permitdifferences in execution may include hardware and/or softwarecomponents. For example, a type of processing core may include hardware(e.g., arrangements of logic gates, memory, buses, and/or otherelectrical components) that is not included in other types of processingcore and that permits the type of processing core to execute someinstructions quickly or efficiently. For example, processing cores of acentral processing unit 210 may be configured to execute a variety ofdifferent instructions, including arithmetic instructions and logicalinstructions. In some embodiments, however, the processing cores of acentral processing unit 210 may not be specially adapted to execute anyparticular instructions more quickly or efficiently than others, butinstead may be generally adapted to execute the arithmetic and logicalinstructions. In contrast, processing cores of a graphics processingunit 212 may be specially adapted to execute one or more particulartypes of instructions. For example, many of the processing cores of agraphics processing unit 212 may be processing cores 212A that arespecially adapted to execute arithmetic operations, including vectoroperations, quickly and efficiently, but may not be able to executelogical instructions with the same quickness or efficiency. Logicaloperations may include comparison operations, Boolean operations, andconditional operations. Some of the processing cores of a graphicsprocessing unit 212, however, may be adapted to execute logicalinstructions more quickly and efficiently than the processing cores 212Aof the graphics processing unit 212. Processing cores 212B of FIG. 3,for example, may be adapted to execute logical operations more quicklyand efficiently than processing cores 212A. However, in some graphicsprocessing units, processing cores 212B may not be able to executelogical operations as quickly and efficiently as processing cores 210Aof central processing unit 210. Processing cores 212C of a graphicsprocessing unit 212 may be specially adapted to perform memory accessoperations to read and/or write data to the on-chip cache 212E morequickly and efficiently than other processing cores of a graphicsprocessing unit 212. Similarly, processing cores 212D may be speciallyadapted to perform memory access instructions to read and/or write datato the shared memory 214 more quickly and efficiently than otherprocessing cores of a graphics processing unit 212. As another exampleof the ways in which processing cores may be adapted to performdifferent types of operations, processing cores of a central processingunit may be capable of performing operations for communicating via acommunication network, such as by sending information to or receivinginformation from a network interface of a computing device of which theprocessing core is a part. In some graphics processing units, however,processing cores may not be capable of performing such operations forcommunication via a network, such as because the processing cores maynot be capable of communicating with a network interface. Otherprocessing cores may be specially adapted to execute other types ofinstructions.

A software development tool, operating according to techniques describedherein to evaluate software modules of a complex processing system (suchas a trading system to evaluate trading information and identifydesirable trades) may produce configuration information dependent inpart on such special adaptations of processing cores. In someembodiments, a software development tool may be configured withinformation regarding different types of special adaptation ofprocessing cores of different types of multicore processing units,including central processing units, graphics processing units, FPGAs, orother forms of multicore processing units. When such a softwaredevelopment tool evaluates software modules to be executed on multicoreprocessing units, as discussed in greater detail below, the softwaredevelopment tool may account for the special adaptation of processingcores when producing configuration information. For example, a softwaredevelopment tool may be provided with information regarding targethardware on which the software modules are to be run, and the softwaredevelopment tool may evaluate the software modules based on specialadaptations of processing cores of the multicore processing units of thetarget hardware.

As discussed in greater detail below, the software development tool mayaccount for differences between processing cores in selectinginstructions to be included in software modules and/or by influencingscheduling of modules for execution on processing cores.

For example, in some embodiments the software development tool maychange instructions included in a software module based on capabilitiesof a processing core on which the module may efficiently execute. Thesoftware development tool may change the instructions by exchanging oneor more instructions included in the module for one or more otherinstructions that may execute more quickly and efficiently on aparticular type of processing core. In embodiments in which the softwaredevelopment tool evaluates software modules include instructionsformatted according to an intermediate language that are not executableby processing cores, the software development tool may influence amanner in which an interpretation facility selects instructions based onthe intermediate language. For example, the software development toolmay exchange instructions of the intermediate language in the module forother intermediate-language instructions that, when interpreted by theinterpretation facility, would result in the interpretation facilityoutputting one or more instructions that would be quickly andefficiently executed by a processing core. As another example, thesoftware development tool may influence a manner in which theinterpretation facility interprets instructions of the intermediatelanguage, and thereby affect which instructions the interpretationfacility chooses as corresponding to instructions of the intermediatelanguage. For example, the configuration information may configure theinterpretation facility to prefer, when interpreting a software module,instructions that will execute quickly and efficiently for a particulartype of processing core on which the software module is to be executed.

The software development tool may, in some embodiments, influencescheduling of software modules for execution on processing cores basedon differences between processing cores. For example, the softwaredevelopment tool may produce configuration information includingscheduling information that identifies that particular software modulesor types of software modules should be assigned by a scheduling facilityfor execution to processing cores having particular adaptations. Forexample, a type of software module that includes particular instructionsor operations may be assigned to a processing core that is able to carryout those instructions/operations more quickly or efficiently.

By influencing the instructions of a software module that will beexecuted and/or by causing software modules to be assigned by ascheduling facility to processing cores based on the types ofinstructions to be executed by the software modules, the softwaredevelopment tool may be able to configure multicore processing units oflow-cost hardware to quickly and efficiently execute instructions forcomplex problems.

A software development tool operating in accordance with techniquesdescribed herein may be implemented in any suitable manner and may carryout any suitable operations to evaluate software modules for executionon multicore processing units. Examples of ways in which the softwaredevelopment tool may be implemented are discussed in detail below,though it should be appreciated that each of the examples below aremerely illustrative of ways in which such a software development toolmay be implemented, and embodiments are not limited to being implementedaccording to any one or more of the examples described below.

FIG. 4A illustrates an example of an environment in which a softwaredevelopment tool operating according to techniques described herein maybe implemented and may be used. The computing environment of FIG. 4Aincludes two environments, a development environment and a productionenvironment, which represent two different primary phases of softwaredevelopment. In the development environment, software is designed,written, tested, and otherwise created. In the production environment,the software that was created in the development environment is executedon one or more processing units and performs the functions for which thesoftware was designed.

Multiple different tools may be used in a development environment forcreating software. For example, code editing tools, build/compilingtools, debugging tools, configuration management tools, and other toolsmay be used in the development environment as development tools fordeveloping software modules. Once software modules have been developedusing the development tools of the development environment, the softwaremodules that were developed may be evaluated using one or moreevaluation tools. Evaluation tools for evaluating software modules thathave been developed may include tools for determining code coverage,memory management, and otherwise evaluating properties of the code orproperties of the execution of the software modules. In accordance withtechniques described herein, the evaluation tools of a developmentenvironment may also include a software development tool for analyzingsoftware modules of a complex system to determine how to quickly andefficiently execute instructions for a complex problem on low-costhardware, including on processing cores of one or more multicoreprocessing units. The evaluation tools may be designed to evaluatesoftware modules using a test environment that mimics the productionenvironment in which the software modules may be executed. This may bedone so that configuration decisions made by the software developmenttool in the development environment, which may be made so as to increaseefficiency and speed of execution in the development environment, may beapplied in the production environment to increase efficiency and speedof execution in the production environment.

Examples of operations that may be carried out by a software developmenttool to determine how to execute software modules for a complex problemquickly and efficiently are discussed in detail below. In general, thesoftware development tool may evaluate source and/or object code ofsoftware modules of a complex system to determine how to configuretarget hardware on which the software modules will be executed in theproduction environment to quickly and efficiently execute instructionsof the software modules. In some embodiments, software modules that areevaluated by a software development tool may be evaluated when writtenin an interpreted language or a semi-compiled state. For example, oncecode is written for the software modules, rather than leaving the codein a source language or in an object code language corresponding totarget hardware, the code may be translated into another language thatthe software development tool is configured to review. By using aninterpreted or semi-compiled language, the software development tool canreview different software modules written in different source languagesand for different target hardware without needing to be configured tounderstand each of the available language or hardware options.

Following the evaluation of the software modules, the softwaredevelopment tool may produce configuration information for configuringthe target hardware of the production environment. The configurationinformation produced by the software development tool may include anysuitable information for configuring target hardware to executeinstructions for the complex system. The configuration information mayinclude the software modules to be interpreted and/or executed, whichmay have been generated and/or edited by the software development toolas discussed below. The configuration information may also include anysuitable information that may be used by an interpretation facilityand/or by a scheduling facility of a multicore processing unit fordetermining processing cores to which to assign software modules forexecution and times at which to assign the software modules to theprocessing cores for execution.

FIG. 4B illustrates a software development process that may be carriedout in some embodiments in the environment illustrated in FIG. 4A. Itshould be appreciated, however, that embodiments are not limited tocarrying out a software development process like the one illustrated inFIG. 4B, and that embodiments are not limited to carrying out a softwaredevelopment process in the illustrative environment illustrated in FIG.4A.

The process 400 of FIG. 4B begins in block 402, in which, during adesign phase of a software development process, software developersidentify a complex problem to be solved. The complex problem may be anysuitable problem that may require multiple operations to be carried outand that may require that operations be carried out on multipledifferent pieces of data. The problem may be one that is designed to besolved once through a single execution of operations to produce asolution, or may be a problem that is designed to be solved repetitivelyfor different input data. A repetitive problem may be, for example, aproblem for which operations may be repeated each time a condition ismet, such as by producing a solution in response to receiving new datain a stream.

Once the complex problem to be solved is identified, in block 404 thesoftware developer identifies the steps of a solution to the problem andidentifies the one or more pieces of data to be processed in theproblem. Techniques described herein for operating low-cost hardware toexecute operations quickly and efficiently for complex problems mayoperate in any suitable manner with problems that are complex in anyway. In many cases, complex problems for which techniques describedherein may be useful may be complex for one of two reasons: the complexproblems include multiple different types of operations that are to becarried out, or the complex problems include multiple different piecesof data that are to be processed. In some cases in which complexproblems are complex because they include multiple different types ofoperations, the multiple different types of operations may be performedon a relatively small data set, with low variability in the data ortypes of data to be processed by different operations. In some cases inwhich complex problems are complex because they include multipledifferent pieces of data, multiple different pieces of data may beprocessed using a relatively small set of operations, with lowvariability in the types of operations to be carried out for differentpieces of data.

In accordance with techniques described herein, once the softwaredeveloper identifies the steps of the solution to the problem andidentifies the data to be processed, the software developer createssoftware modules to be executed and specifies the sources of data to beprocessed by each of the software modules. Speed and efficiency ofexecution may be increased when operations are executed on a multicoreprocessing unit by designing the operations to be executed in parallel,which can be achieved by separating operations into different softwaremodules. By separating operations into different software modules, thedifferent modules may be executed in parallel, at the same time, whichmay increase speed and efficiency of execution. Accordingly, a softwaredeveloper may create multiple different software modules that eachinclude instructions for carrying out some of the operations that form apart of the solution to the complex problem and that may each performprocessing on some of the data to be processed as part of the complexproblem. Each of the software modules may also be configured to receivethe data from a particular source, or to receive a particular type ofdata formatted in a particular manner.

As discussed above, however, creating software modules for quick andefficient execution on low-cost hardware is difficult and may requireintimate knowledge of the operations of multicore processing units andthe manner in which a multicore processing unit executes particularinstructions or types of instructions. Accordingly, when the softwaredeveloper creates the software modules and specifies the type/source ofdata to be processed by each of the software modules, the softwaredeveloper may not have created the modules and specified data in amanner that would result in quick and efficient execution of thesoftware modules on a multicore processing unit. Rather, in some cases,the software modules created by the software developer may executeslowly on a multicore processing unit.

Software modules may execute slowly on a multicore processing unit forany of a variety of reasons. As one example, if instructions are dividedinto software modules too finely, this may result in a very large numberof software modules each executing a relatively small number ofinstructions. In some cases, the number of modules may exceed the numberof cores of a multicore processing unit. To execute the softwaremodules, then, a scheduling facility for a multicore processing unit mayperform context switches on processing cores to configure a core toexecute different modules at different times. When there is a largenumber of software modules, the scheduling facility may have to carryout a large number of context switches. When a software module is to beexecuted and a context switch is performed, the instructions of thesoftware module are made available to the processing core and data to beprocessed by that software module is made available to the processingcore. The instructions and data may be made available by loading eachinto an on-chip cache of the multicore processing unit or of anindividual processing core, or in any other suitable way. Additionally,as part of the context switch, the instructions and data for a priorsoftware module may be moved from a storage for a processing core toanother storage, such as from an on-chip cache to a system memory.Performing such a context switch by loading and unloading instructionsand data may take a relatively long time and the multicore processingunit may not execute instructions during context switches, which mayresult in delays of execution. When context switches have to beperformed a large number of times, the delays for each context switchcan result in execution times for software modules that are very long.

As another example of a reason software modules may execute slowly,memory access operations to read and/or write data to memory may take arelatively long time to execute on a processing core. The operations mayexecute relatively slowly when data to be read from memory is notavailable in a local cache for the processing core that is to processthe data and the data is retrieved from another cache or system memoryfor a device. Because of delays due to memory access operations,software modules that perform a large number of memory access operationsfor relatively small amounts of memory may take a relatively long timeto execute. This may result in delays as a processing core may notexecute another module while waiting for a memory access operation tocomplete, but instead may wait for the operation to finish.

As a third example of a reason software modules may execute slowly,exchanging information between a central processing unit and a graphicsprocessing unit on a computing device may take a relatively long time ascompared to memory access operations that are performed entirely withinthe central processing unit or entirely within a graphics processingunit. Accordingly, software modules that perform a large number ofoperations to exchange data between a central processing unit and agraphics processing unit may take a relatively long time to execute dueto the time spent exchanging data. This may resulting in delays as theprocessing core waits for the operations to complete rather thanexecuting another module or operation. As an example of such exchanges,in some contexts, operations executed by a graphics processing unit mayresult in information to be communicated via a network. Because agraphics processing unit may not be able to operate a network interfaceto carry out the communication, the graphics processing unit maycommunicate with a central processing unit. The CPU may in turn effectthe communication over the network. Because of the delay caused byexchanging information between a graphics processing unit and a centralprocessing unit, execution of operations to determine whether tocommunicate via the network and operations to communicate via thenetwork may take a long time to execute.

As another example of a reason that software modules may execute slowly,in some embodiments, a management facility for a multicore processingunit may not permit software modules to be provided to the managementfacility formatted using instructions that can be executed by cores ofthe multicore processing unit. Instead, the software modules may beprovided to the management facility using a different language, such asan intermediate language. In these embodiments, as discussed above, aninterpretation facility of the management facility may interpret theinstructions formatted using the intermediate language of the inputsoftware modules and produce, as output, corresponding software modulesthat include sets of instructions that can be executed by the cores. Ifthe instructions in the intermediate language include instructions thatdo not correspond to instructions that will execute quickly orefficiently on target hardware, the software modules, once interpreted,may execute slowly.

These difficulties in executing software modules on multicore processingunits may be alleviated by creating software modules in particular waysgiven the types of instructions to be executed by the software modulesor the types of data to be processed by software modules. For example,if a large number of different operations is to be conducted on the samepieces of data (such as when a complex problem is complex due toincluding a wide variety of operations, but not a wide variety of data),constructing software modules that include multiple different types ofoperations conducted on one piece of data may be beneficial. This may bebecause the large number of different operations within a module reducesthe number of software modules that are constructed, reducing the numberof context switches that must be carried out during execution of thesoftware modules. Additionally, performing multiple different operationson data following one or a few memory access operations in a softwaremodule to retrieve that data may result in fewer memory accessoperations overall. By reducing the number of context switches andreducing the number of memory access operations, combining operationsinto fewer software modules may increase the speed and efficiency ofexecution of software modules. This may be so, despite that a commonapproach may be to separate the different operations into differentsoftware modules so as to increase the ability of these differentoperations to be performed in parallel.

Additionally, as discussed above, different processing cores of amulticore processing unit may be specially adapted to execute differenttypes of instructions quickly and efficiently. For example, oneprocessing core may be adapted to execute logical operations moreefficiently than another processing core, and one processing core may beadapted to execute memory access operations more efficiently thananother processing core. As such, configuring a multicore processingunit to execute different software modules on particular processingcores of the multicore processing unit based on the instructionsincluded in a software module may be advantageous. For example, when asoftware module includes a certain type of instructions, a multicoreprocessing unit may be configured to execute that software module on acertain type of processing core. By doing so, software modules withparticular types of instructions may execute on certain types ofprocessing cores. The software modules may therefore execute morequickly and efficiently.

Also, when operations that together form a solution of a complex problemare divided into multiple different software modules, in some cases someof the software modules may include operations that should be executedafter operations of other software modules. For example, a softwaremodule may accept as input processed data output by another softwaremodule, and may produce further processed data as output, which may inturn be provided to another software module. In such a case, if ascheduling facility is not provided with information identifying adependency between software modules, the scheduling facility may assigna software module to execute on a processing core before anothersoftware module on which it depends. In such a case, the dependentsoftware module may wait for the input from the other software module,and delay execution until the other software module executes andprovides the input. Identifying dependencies between software modulesand making a scheduling facility aware of the dependencies, such that ascheduling of execution of software modules accounts for thedependencies, can also lead to a more quick and efficient execution ofsoftware modules.

Further, as discussed above, different instructions may executedifferently on processing cores of one or more multicore processingunits, with some instructions executing more quickly or efficiently thanothers. Some types of processing core may execute some instructions morequickly or more efficiently than other types of processing core. Aninterpretation facility for a multicore processing unit may be adaptedto evaluate intermediate language instructions and identify an efficientset of instructions that corresponds to the intermediate languageinstructions and can execute on processing cores. However, theinterpretation facility may be arranged with default rules forinterpretation that may generally result in efficient sets ofinstructions. These rules for interpretation may not, however, result inefficient sets of instructions for a particular software module. In someembodiments in which an interpretation facility interprets intermediatelanguage instructions, the interpretation facility may be able to acceptinput that configures the interpretation that is to be performed,including by indicating that particular types of instructions should bepreferred or should be avoided. For example, in some cases a defaultrule of an interpretation facility may indicate that Single Instruction,Multiple Data (SIMD) instructions should be created during aninterpretation wherever possible, as SIMD instructions may, in general,be executed efficiently. The interpretation facility may also acceptinput, however, that identifies that SIMD instructions should not beused or should not be used. Additionally, the interpretation facilitymay output different instructions based on different intermediatelanguage instructions that are input. Thus, the instructions that causea processing core to carry out an operation that are output by theinterpretation facility may vary based on which instructions describingthe operation are input to the interpretation facility. Thus, byproviding configuration inputs or particular intermediate languageinstructions to an interpretation facility, a particular set ofinstructions, which may not normally be produced by the interpretationfacility, that may execute quickly or efficiently in a particularcontext may be produced.

However, software developers that are not intimately familiar with themulticore processing units for which they are developing software maynot be aware of advantages and disadvantages that may result from thedifferent ways of dividing operations into software modules, forconfiguring a multicore processing unit for executing the softwaremodules, or for specifying instructions for execution. Thus, softwaredevelopers who are developing software modules for parallel execution onlow-cost multicore processing units may benefit from a softwaredevelopment tool that evaluates software modules that have been createdand are intended to be executed on one or more multicore processingunits to determine whether the software modules created by a developerwill execute quickly and efficiently. The evaluation may be carried outbased on any suitable factors, including by analyzing the softwaremodules individually or collectively in view of any of the difficultiesin creating software modules for parallel execution mentioned above.Determining whether the modules will execute quickly and efficiently mayinclude evaluating the instructions included in the modules, evaluatingthe data to be processed by the modules, and/or evaluating a collectionof software modules and the manner in which the software modulesinterconnect and communicate with one another. The evaluation mayinclude evaluating the modules in storage and/or evaluating the modulesduring execution of the modules by processing cores of one or moremulticore processing units. Such a software development tool mayevaluate the modules created by the software developer and may determinewhether the software modules will execute quickly and efficiently onspecified target hardware for the software modules. The softwaredevelopment tool may automatically edit the software modules as a resultof this evaluation. Editing the software modules may include editing anindividual software module and/or editing the collection of softwaremodules and/or the interconnections between the software modules.Editing the software modules may also include changing instructionsincluded in a software module, such as by changing intermediate languageinstructions included in the software module, which may change theinstructions that are output from an interpretation facility. Inaddition to or as an alternative to automatically editing softwaremodules, the software development tool may produce information to beprovided to the software developer as suggestions of how to change thesoftware modules to improve the speed or efficiency of execution of thesoftware modules. Further, the software development tool may produceconfiguration information for configuring target hardware, including oneor more multicore processing units, for execution of the softwaremodules.

Accordingly, in block 408 of FIG. 4, the software modules created by thesoftware developer in block 406 and the types of data specified in block406 are evaluated using a software development tool. As a result of theevaluation, the software development tool produces configurationinformation. In block 410, one or more multicore processing units andthe processing cores of the multicore processing unit(s) in a productionenvironment are configured with the configuration information. Inembodiments in which an interpretation facility creates sets ofinstructions for the software modules from instructions arranged in anintermediate language or another format that is not executable byprocessing cores, the interpretation facility may create theinstructions for the modules in block 410. Then, in block 410 theprocessing cores may execute the software modules to perform processingon data provided to the cores.

Once the processing cores are configured and executing software modulesin block 410, the process 400 ends. Following the process 400, themulticore processing unit(s) are able to execute operations for acomplex processing system and may process the data to produce a solutionto the complex problem.

As discussed above, techniques described herein may be used with anysuitable type of complex problem. One type of complex problem for whichsoftware modules may be developed and for which software modules may beanalyzed using a software development tool as described herein isfinancial arbitrage.

FIG. 5 illustrates an exemplary process 500 for developing softwaremodules for execution on one or more multicore processing units toperform operations of a trading system for financial arbitrage. Theprocess 500 of FIG. 5 is a specific example of the exemplary process 400of FIG. 4.

The process 500 begins in block 502, in which a software developeridentifies, for the trading system to be created, the processing stepsincluded in evaluating sequences of potential trades in a financialarbitrage setting and the data to be evaluated. The data to be processedin a trading system may include trading information received forpotential trades with banks or other counterparties to potential trades(e.g., information received from source of data 202A of FIG. 2) as wellas information maintained by an operator of the trading system. Examplesof the types of information regarding potential trades that may bereceived are described above. Information maintained by an operator mayinclude information regarding previous trading activity and/orpredictions regarding future activity. Information on previous tradingactivity may include information regarding recent trades and recentdenied trades. Information on predictions regarding future activity mayinclude predictions regarding counterparties to potential trades,including whether the counterparties are expected to approve trades inthe future. In some cases, if information on previous trading activityindicates that a counterparty has been denying trades, predictions onfuture trading activity may identify that the counterparty is expectedto deny trades. The processing steps for financial arbitrage may includeidentifying an anticipated profit associated with a sequence of tradesof currencies and identifying a likelihood of the sequence of tradesbeing approved by each counterparty of each trade included in thesequence of trades.

Once the steps of the solution and the data to be processed have beenidentified by the software developer, in block 504, software modules forthe steps are created manually and/or automatically. In someembodiments, as discussed below in more detail in connection with FIGS.6 and 8, template processing modules may be created manually and modulesthat are instances of the template may be created automatically. Inaddition, in block 504, the software developer specifies types of datato be evaluated by the software modules. The software developer maycreate software modules in block 504 that perform operations fordetermining an anticipated profit associated with a sequence of tradesand that perform operations for determining a likelihood of a sequenceof trades being approved and/or completed. Software modules may becreated for each of the permutations of trades and sequences of trades,such that the financial arbitrage operations are carried out on eachpotential trade indicated by trading information. Additionally, thesoftware developer may specify the sources of data to be processed byeach software module. The sources may include sources providing tradinginformation for banks and other potential counterparties from whichtrading information may be received and may identify each type of datathat may be received from these sources.

In block 506, a software development tool evaluates the software modulescreated and the types of data specified by the software developer. Inevaluating the software modules created by the software developer, thesoftware development tool may identify dependencies between the softwaremodules. For example, the software modules for financial arbitrage may,in some implementations, include one or more modules to calculateanticipated profits associated with potential trades and one or moremodules to calculate, based on the profits of individual trades,anticipated profits associated with potential sequences of trades. Thesoftware development tool, upon evaluating these modules, may identifythat a module for calculating an anticipated profit associated with asequence of trades is dependent on one or more modules that calculateanticipated profits associated with each of the trades in the sequence.The software modules for financial arbitrage may also, in someimplementations, include modules that include primarily arithmeticinstructions, such as calculating anticipated profits, and modules thatinclude primarily logical instructions, such as for comparing potentialsfor profit between different potential sequences of trades to select asequence of trades to execute. In some embodiments, processing cores ofone or more multicore processing units may have different capabilitiesand/or configurations, and thus some cores may be able to execute sometypes of instructions more efficiently than other cores. The softwaredevelopment tool may identify that the modules should be assigned toparticular processing cores based on the types of instructions. Thesoftware development tool may identify that some of the modules shouldbe assigned for execution to processing cores that are specially adaptedfor arithmetic operations and others of the modules should be assignedfor execution to processing cores that are specially adapted for logicaloperations. The software development tool may carry out a similarprocess when the target hardware that is to execute the modules thatselect a trade for execution is a first processing unit without accessto a network interface (e.g., a GPU without access to a networkinterface) and a bridge is not operating locally on the computingdevice. In implementations of a trading system that operate in such anenvironment, software modules of the trading system may include a moduleto communicate an identification of a sequence of trades to be executedto a second processing unit that has access to a network interface(e.g., a CPU). When the hardware with access to the network receives theidentification of the sequence of trades, a module or other part of thetrading system executing on the second processing unit may execute thesequence of trades, such as by communicating via a network to instruct abridge to execute the sequence. In some embodiments, therefore, one ormore of the software modules may therefore execute instructions totransfer data between a first processing unit and a second processingunit. Upon evaluating the instructions of that software module, thesoftware development tool may identify that the software module shouldbe assigned for execution to a processing core of the first processingunit that is specially adapted for accessing shared memory that isaccessible by the second processing unit. In addition to evaluating theinstructions to identify processing cores on which a software moduleshould execute, the software development tool may evaluate theinstructions to determine how to configure an interpretation performedby an interpretation facility. In embodiments in which an interpretationfacility interprets instructions of a software module that are formattedin one way and produces sets of instructions that are executable byprocessing cores of a multicore processing unit, the softwaredevelopment tool may configure the interpretation to be performed in aparticular manner based on the evaluation of the instructions of thesoftware modules.

Other forms of evaluation, and specific techniques for conducting theevaluation, are discussed in detail below in connection with FIGS. 8-9.

As a result of the evaluation of block 506, the software developmenttool may output configuration information for use by one or moremanagement facilities for one or more processing units, and/or mayoutput suggestions to a software developer on how to edit the softwaremodules to improve execution efficiency. The configuration informationmay include any suitable information, including information ondependencies and relative times at which software modules should beexecuted and information on processing cores to which software modulesshould be assigned.

In block 508, the multicore processing unit(s) and the processing coresmay be configured according to the configuration information output bythe software development tool. Configuring the multicore processingunits may, in some embodiments, include configuring an interpretationperformed by an interpretation facility. Configuring the multicoreprocessing unit(s) may also include providing information regardingprocessing cores to which to assign software modules, dependenciesbetween software modules, or any other suitable information regardinghow software modules are to be scheduled for execution, to one or morescheduling facilities of the multicore processing unit(s). Once thescheduling facilities have the configuration information, the schedulingfacilities may create a schedule for execution of the software modulesaccording to the configuration information and cause processing cores tobe programmed with software modules according to the schedule.

Once the multicore processing unit(s) are configured, in block 510 themulticore processing unit(s) may begin executing the software modules toprocess trading information and select sequences of potential trades toexecute. The sequences of potential trades to execute may be selectedbased on potential for profit associated with each of the sequences oftrades. The potential for profit of a sequence of trades may be based onan anticipated profit associated with the sequence, if the sequence iscompleted, as well as the likelihood of the potential trades included inthe sequence being approved by the counterparties to those potentialtrades. The likelihood for the sequence may be determined based oninformation regarding past trading activity, current trading activity,and/or future trading activity. Information regarding past tradingactivity may include information regarding potential trades that werenot selected for execution, potential trades that were selected and werenot denied and not executed, and/or potential trades that were selectedand were executed. Information regarding current trading activity mayinclude information regarding the potential trades that may be selected,such as a source or age of the information regarding the potentialtrade. Information regarding future trading activity may includepredictions regarding future trades based at least in part on pasttrades, such as information identifying that a particular counterpartyis expected to deny trades in the future. Other criteria mayadditionally or alternatively be used to identify and select desirablesequences of trades for execution, as embodiments are not limited inthis respect. One or more modules of the software modules may applythese criteria and select, from among the processing chains and thesequences of potential trades, one or more sequences of potential tradesto execute.

In block 512, once a sequence of trades has been selected by the modulesin block 510, the trades included in the selected sequence of trades maybe executed. To execute trades, in some embodiments the software modulesof the multicore processing units may issue an instruction to anothersoftware facility of a trading system to identify that trades of asequence of trades should be executed, and the software facility mayexecute the trades. In some embodiments, to execute the trades, thesoftware facility may communicate with a bridge. The bridge may be oneexecuting on a separate computing device, as in the example of FIG. 2,or may be one executing on the same computing device as is executing thesoftware modules, or may be arranged for execution on any suitablecomputing device. Embodiments are not limited to including a bridge orto including a bridge implemented on any particular computing device.

Once the trades are executed in block 512, the process 500 may return toblock 510. In block 510, the software modules again evaluate tradinginformation. The trading information evaluated when the process 500returns to block 510 may include previously-received trading informationand new trading information that has been recently received by thetrading system and that identifies new trades and/or new terms fortrades. In some embodiments, software modules may not evaluatepreviously-received trading information and may not execute on aprocessing core until new trading information is received for executionby the processing core. In other embodiments, however, some or all ofthe trading information may be evaluated by a software module includingwhen the trading information was received at a prior time or waspreviously processed by the system and/or by a user. A software modulemay evaluate previously-received data for any suitable reason, includingthat a counterparty is unlikely to deny a trade to which thepreviously-received data corresponds, or that the data updatesinfrequently and may not be out of date. The process 500 may repeat theoperations of blocks 510, 512 indefinitely, continuing to execute thesoftware modules and executing identified trades until no more tradinginformation is available, or may stop execution in response tosatisfaction of any suitable condition. Embodiments are not limited tocontinuing or ceasing execution of a system, including a trading system,for any particular reason.

As a result of the process 500 of FIG. 5, a multicore processing unit isconfigured with software modules to execute operations of a tradingsystem for carrying out financial arbitrage. As a result of theevaluation of the modules by the software development tool andsubsequent configuration of one or more multicore processing units, thesoftware modules with which the multicore processing unit is configuredmay be able to execute quickly and efficiently on the multicoreprocessing unit for processing trading information and identifyingsequences of potential trades to be performed.

It should be appreciated that software modules that include executableinstructions for performing operations related to complex problems maybe created with any suitable instructions based on any suitable divisionof operations included in a complex problem. Embodiments are not limitedto dividing operations for complex problems into software modules orarranging operations for execution by processing units in any particularmanner. FIG. 6 shows one illustrative process that may be used fordividing operations of a complex problem into software modules.

The process 600 begins in block 602, in which a software developeridentifies the problem that is to be solved using software modulesexecuting on one or more multicore processing units and identifies theoperations to be included in a system for producing a solution to thatproblem. In block 604, the software developer identifies the data thatwill be processed by those operations. From the data that will beprocessed and the operations that are to be executed, the softwaredeveloper may be able to identify, in block 606, a full set ofoperations to be included in a system for producing a solution to theproblem. The system may involve combinations of the operations to becarried out and data to be processed, such that the data to be processedis processed using the operations. In block 606, the software developeridentifies, from these combinations of data and operations, a completealgorithm for the solution to the problem that is an interconnectedgraph of the operations carried out on the data to be processed. Theinterconnected graph may include, as nodes, operations to be performedon data and connections between the nodes. The interconnected graph mayalso include, as connections between nodes, identifications of sourcesof inputs for nodes and destinations of outputs of nodes.

On the basis of the algorithm identified in block 606, the softwaredeveloper may then be able to identify in block 608 repeated sets ofsimilar operations performed on similar types of data. The identifiedrepeated sets may be distinct chains of operations that are included inthe algorithm for the solution. A chain may include operations that arenot dependent on operations of other chains and that can therefore beexecuted in parallel with operations of other changes when the systemfor producing a solution to the problem is executed on one or moremulticore processing units. Software modules that are to be executed inparallel on processing cores of multicore processing units may bedefined on the basis of these processing chains that may beparallelized. For example, in block 610, the software developer reviewsthe processing chains to identify, between the processing chains, groupsof operations that include the same or similar operations carried out onthe one or more inputs that are the same or similar types of data ordata from the same or similar source and that produce the same orsimilar one or more outputs. When such a group of operations isidentified, a software module can be created from these operations thatcan be used as a template for subsequently building software modules forthe processing chains for the algorithm.

In block 612, therefore, the software developer creates a type ofsoftware module for each of the groups of operations identified in block610. A software module type created by the software developer in block612 may include executable instructions corresponding to the operationsof one of these groups. The executable instructions that are included insoftware module types created in block 612 may be any suitableinstructions formatted in any suitable manner. In some embodiments, thesoftware module types may include instructions that are executable byprocessing cores of a multicore processing unit. In other embodiments,the software module types may include instructions formatted accordingto an intermediate language that is interpreted by an interpretationfacility for a multicore processing unit to produce instructionsexecutable by processing cores of that unit. The software module typemay be configured to accept input and produce output based on the typesof data and/or sources of data to be processed by the operations of thegroup.

Operations of a system and of processing chains of the system may bedivided into groups in any suitable manner. A group may include anysuitable number of operations and, therefore, a software module type maycorrespond to any suitable number of operations. In some cases, groupsof operations identified in block 610 may be able to be subdivided intosmaller groups of the same operations that produce similar outputs basedon similar inputs. When groups of operations may be subdivided, thesoftware developer may include instructions corresponding to anysuitable portion of the operations of a group in a software module type.The portion of the operations to be included in a software module typemay be based, for example, on the types of instructions to be includedin the software module type and the speed with which these instructionsmay execute. For example, in some cases a solution to a problem mayinclude performing multiple different operations on a single piece ofdata. These operations may be identified as one group of operations.Instructions corresponding to these operations may be arranged together,in one software module type. However, arranging the instructions for allof the operations of a group in one software module type may result inslowed execution in some cases. The slowed execution may result becauseeach instruction of the software module type waits for a priorinstruction to complete when the instructions are arranged to beexecuted in series when a software module of the type is executed. Ifthe operations are independent of one another, however, the operationsof the group may be able to be subdivided into more groups. Efficiencyof execution of instructions corresponding to the operations of thegroup may be improved through the subdivision of the operations intomultiple different software module types. Each software module type mayinclude only a portion of the operations in each software module type.As such, instructions for the different operations can be executed ondifferent processing cores and parallelized when executed on a multicoreprocessing unit. When the operations are parallelized, the operationsmay each be performed on the data at the same time. Parallelizing theoperations may therefore increase the speed and efficiency with whichthe software modules execute on the multicore processing units. In someembodiments, software module types may be created using each of thesmallest identifiable groups of similar operations performed on similardata to produce similar outputs that produce an intermediate result canbe identified in processing chains. Using the smallest identifiablegroups may increase the number of operations that are able to execute inparallel on one or more multicore processing units. In some cases,however, increasing the parallelizing of operations may not lead to themost efficient or fastest execution of those operations. Rather, in somecases, executing the operations serially in one software module may leadto the operations being executed more quickly or more efficiently. Forexample, in some hardware on which modules will be executed,characteristics of memory access and data transfer of the hardware maylead to a longer time spent providing data to two different modules,each requesting different data from memory and needing the data providedto processing cores executing those modules, than providing that samedata to one software module executing on one processing core.Additionally, increasing the number of software modules may alsoincrease the number of context switches that are performed whenexecuting the modules. Context switches may create delays, as discussedabove. Therefore, parallelizing may lead to a loss of efficiency orexecution speed in some cases. In cases in which parallelizing wouldresult in a loss of efficiency or execution speed, greater efficiency orspeed may be achieved by placing these operations in the same softwaremodule. Thus, it should be appreciated that embodiments are not limitedto dividing operations of a system into software module types in anyparticular manner.

Once the types of the software modules are created in block 612, thetypes of the software modules may be used in block 614, manually and/orautomatically through a software process (such as the softwaredevelopment tool), as templates to create instances of the softwaremodule types for each of the processing chains identified by thesoftware developer. Where the types of software modules are usedautomatically through a software process to generate instances ofsoftware modules, the generation of the software modules may be donewithout user input. For example, a user may trigger generation of themodules based on the template, and the software process may carry outthe generation without further input from the user to perform thegeneration.

Each of the software modules that are instances of a software moduletype may be arranged with particular sources of inputs and destinationsof outputs that correspond to the manner in which the software moduleswill be used in the system to produce a solution to the problem. Thesources of inputs and destinations of outputs for a software module mayinclude other software modules. By creating instances of the softwaremodule types, the software developer creates an interconnected set ofsoftware modules that, when executed, cause one or more multicoreprocessing units to determine the solution to the problem. Theinterconnected set of software modules may correspond to theinterconnected graph identified by the software developer in block 606.

Once the software modules for each processing chain are created in block614, the process 600 ends. As a result of the process 600, a set ofsoftware modules is created that may be stored, such as on a storage(e.g., disk) of a computing device in the development environment. Themodules may then be provided to a software development tool forevaluation and/or may be provided to a multicore processing unit to beexecuted.

The process 600 of FIG. 6 for creating software module types andsoftware modules was described generally, without reference to anyparticular problem or operations to be carried out for solving aproblem. FIGS. 7A-7C continue the example of financial arbitragediscussed above and provide an example of a manner in which softwaremodule types may be created for a complex problem.

The process 700 of FIG. 7A illustrates a set of operations that may beperformed by a trading system to identify, from trading information, asequence of potential trades that should be executed. The process 700begins in block 702, in which the trading system identifies, for eachpotential foreign exchange trade that a counterparty has offered tomake, exchange rates for the trade. The exchange rates may be determinedon the basis of trading information received from the counterparty. Inblock 704, the trading system identifies sequences of potential tradesby identifying available combinations of potential trades. The availablecombinations of potential trades may be, in some embodiments, allpermutations of potential trades. In other embodiments, one or moreconstraints may be imposed in determining which permutations ofpotential trades are available as sequences of potential trades. Forexample, a constraint may be imposed on the maximum number of potentialtrades to include in a sequence. As another example, a constraint may beimposed that only one potential trade per counterparty is permitted in asequence of potential trades. It should be appreciated that any suitableconstraints may be imposed, as embodiments are not limited in thisrespect.

Once the available sequences of potential trades are identified, thetrading system may also identify for each sequence an anticipated profitassociated with the sequence. The anticipated profit for each sequencemay be identified on the basis of the price of each of the potentialtrades in the sequence, which is the exchange rate offered for each ofthe trades by the counterparties to those potential trades. Theanticipated profit may also, in some cases, be identified on the basisof a proposed volume for a trade that is specified by a counterparty tothe potential trade. However, embodiments are not limited to operatingin a scenario in which a counterparty proposes a volume for a potentialtrade in addition to a price. Thus, in some cases in which a volume isnot proposed, an anticipated profit may be identified on the basis of anexchange rate and not a volume.

The trading system may also determine, in block 706, a likelihood ofeach of the potential trades of a sequence being approved by thecounterparties to those potential trades and, thus, the likelihood ofthe sequence being approved and completed. The likelihood of approvalfor a potential trade may be based on any suitable information, asembodiments are not limited in this respect. As discussed above, thelikelihood may be based on information regarding previous tradingactivity, information regarding current trading activity, and/orpredictions regarding future trading activity. In some embodiments, thelikelihood of a potential trade being approved may be based oninformation including an age of the trading information identifying thepotential trade, a number of recent trades made with the counterparty tothe potential trade, and a number of recent rejected trades that wereattempted with the counterparty. On the basis of the anticipated profitidentified for each sequence of potential trades and the likelihood ofapproval of each sequence, the trading system may determine in block 708a potential for profit associated with each of the sequences and selectfor execution one of the sequences of trades. The sequence having thehighest potential for profit out of the sequences, for example, may beselected by the trading system. It should be appreciated that, in somecases, a sequence of trades having the highest potential for profit maynot be the sequence having the highest anticipated profit. Rather, thepotential for profit for a sequence of trades may be based on theanticipated profit as well as the likelihood of the trade beingapproved, such that a sequence with a high anticipated profit may nothave a high potential for profit. Once the selection is made in block708, the process 700 ends.

As discussed above in connection with FIG. 6, once a software developerhas identified the operations that may be carried out as part of asolution to a complex problem, the software developer may also identifydata to be processed by those operations and identify a graph ofoperations carried out on data that represents the algorithm that is tobe executed for the solution to the problem. FIG. 7B illustrates anexample of such a graph including combinations of operations and data tobe processed. The example of FIG. 7B illustrates operations includingidentifying exchange rates for trading currencies with counterparties(e.g., exchange rate “Rate₁” for exchanging currencies “Curr₁” and“Curr₂” with counterparty “Bank₁”) and identifying a running anticipatedprofit for a sequence by multiplying the rates of each trade todetermine an overall rate. The operations illustrated in FIG. 7B alsoinclude determining a likelihood of approval for the sequence of tradesby multiplying the probabilities for approval for the individual tradesof a sequence. Lastly, FIG. 7B illustrates selecting between twosequences of potential trades based on the overall rate (which mayindicate an anticipated profit for the sequence) and likelihood ofapproval for the sequences. As discussed above, while not illustrated inthe example of FIG. 7B, it should be appreciated that some tradingsystems operating in accordance with techniques described herein mayconsider a volume of a potential trade in addition to exchange rate whendetermining an anticipated profit associated with a potential trade andsequence of potential trades. In such embodiments, trading informationreceived from a counterparty may identify a volume of currency that thecounterparty is willing to trade at a specified exchange rate, and thisvolume may be considered by a trading system as part of determining ananticipated profit for a potential trade.

The process 600 of FIG. 6 also includes steps for identifying processingchains in the operations included in the graph, which were parallel setsof similar operations. As should be appreciated from the illustration,FIG. 7B includes two processing chains 710 and 712, one for eachsequence of potential trades. Each of the processing chains 710, 712includes the same sets of operations that will correspond to the sameexecutable instructions, and these operations are carried out on similartypes of data. Accordingly, as in the example of FIG. 6, these twoprocessing chains can be evaluated to determine types of softwaremodules to be created.

From an analysis of the operations of the processing chains, fourdifferent types of software module can be identified. These four typesof software module, as well as instances of them corresponding to thedata processed in the example of FIG. 7B, are illustrated in FIG. 7C.FIG. 7C illustrates a set of software modules of four different types720-726, arranged in different rows identifying a manner in which thesoftware modules may be parallelized. The types of software moduleidentified from the processing chains of FIG. 7B include softwaremodules of type 720 for identifying an exchange rate for a potentialtrade from trading information received from the counterparty for thatpotential trade. The software modules of type 720 do not depend on oneanother, but rather only depend for execution on receiving input tradinginformation. Therefore, the software modules of type 720 are eligible tobe executed in parallel with one another.

The software modules also include modules of type 722 that accept asinput the exchange rates determined by each of the modules of type 720,process the exchange rates to determine an overall exchange rate for thesequence of potential trades, and produce as output the overall exchangerate for the sequence that represents an anticipated profit from thetrade. Because the software modules of type 722 depend on modules oftype 720 for input, a software module of type 722 should be executed ina multicore processing unit after the time at which the modules 720 onwhich it depends execute. Software modules of type 722 may, however, beexecuted in parallel with modules of type 720 on which the modules oftype 722 do not depend. In some cases, the modules of type 722 may beexecuted in parallel with modules of type 724. As should be appreciatedfrom the graph of FIG. 7C, modules of type 722 do not accept input frommodules of type 724 and are therefore not dependent on modules of type724.

The software modules of type 724 include instructions to determine alikelihood of a sequence of trades being approved and outputting thedetermined likelihood. The software modules of type 724 are notillustrated in FIG. 7C as being dependent for input on other softwaremodules, and may therefore be executed in parallel with modules of type720, in parallel with modules of type 722, or before or after modules oftypes 720 or 722. Lastly, the types of modules included in the exampleof FIG. 7C include a type 726 that evaluates the sequences of potentialtrades to identify desirable sequences of potential trades. Theevaluation module of type 726 selects one or more sequences of potentialtrades to execute. To do so, the evaluation module of type 726 mayaccept as input the overall rate of exchange for a sequence of potentialtrades and a likelihood of a sequence of trades being approved from twodifferent sequences, compares the potential for profit associated witheach sequence of potential trades, and selects a sequence of trades tobe performed that has the highest potential for profit. In theembodiment of FIG. 7C, the potential for profit of a sequence ofpotential trades is determined by the module of type 726 by weighting ananticipated profit of a sequence of trades by the likelihood of thesequence of trades being approved and completed. The sequence of tradeshaving the highest potential for profit may therefore be identified asthe sequence of trades having the highest weighted anticipated profit.Thus, the sequence of trades having the highest potential for profit maynot be the sequence having the highest anticipated profit.

The four types of software module 720-726 of FIG. 7C may form a templatefor a processing chain for a financial arbitrage problem. Eachprocessing chain includes operations corresponding to these four typesof software module. By creating instances of each of these four typesthat are configured with particular sources of inputs and particulardestinations of outputs, which tie the software modules together whenthe inputs and outputs are other software modules, the graph shown inFIG. 7C, which represents the operations of a system for producing asolution to the financial arbitrage problem, can be created.

In accordance with techniques described herein, software modules and/ortypes of software modules for a solution to a complex problem may beevaluated by a software development tool. The software development toolmay evaluate the modules in any suitable manner and select aconfiguration for one or more multicore processing units based on theevaluation. The configuration that is selected may be selected fromamong multiple different configurations, each of which represents adifferent set of options, constraints on scheduling, modules, or otherfactors that may be incorporated into a configuration. In someembodiments, the multiple different configurations from which theconfiguration is selected may not each be specifically defined inadvance, but rather may be available as options by setting differentconfiguration factors differently. Embodiments are not limited tocarrying out any particular process for evaluating software modules andselecting a configuration. Examples of processes that may be carried outby software development tools in accordance with techniques describedherein are described below in connection with FIGS. 8 and 9. Further, asdiscussed below in connection with FIG. 11, in some embodiments aconfiguration process may be repeated over time and result in aniterative selection of different configurations, as the softwaredevelopment tool may identify over time ways to improve an executionefficiency of the system.

Prior to the start of the process 800, a software developer identifies aproblem to be solved, reviews operations that form a part of thesolution to the problem and data to be processed as part of thesolution, and creates types of software modules based on that review.The types of software modules that are created may be, as discussedabove, templates for software modules that will form a part of thesolution. Software modules to be executed on processing cores ofmulticore processing units may be created as instances of these templatesoftware modules. In addition, the software developer arranges thetemplate software modules in a template processing chain, such that thesoftware development tool is able to analyze the template softwaremodules in the context of other modules with which the modules are toexchange data. In the example of FIG. 8, the template processing chainidentifies the template software modules as well as input/outputinterconnections between the modules of the template processing chains.The input/output connections may identify the types of data to beprocessed by each modules, the sources of inputs for modules, and thedestinations of outputs of modules.

The process 800 begins in block 802, in which the software developmenttool receives the template processing chain including the templatesoftware modules, and the specification of data to be processed bysoftware modules based on the template software modules. As discussedabove in connection with FIG. 7C, a template processing chain mayinclude multiple different software modules that may be included in eachof the processing chains of a solution to a problem. The processingchains may, in some cases, identify the operations to be performed ondata related to the problem. In the case of financial arbitrage, forexample, the template processing chain may identify the operations to beperformed for processing sequences of potential trades and thearrangement of the operations into software modules. The specificationof data may include any suitable information describing the data to beprocessed by the modules or the manner in which the data is to beprocessed. For example, the specification may include informationdefining types of and/or sources of data to be processed. Thespecification may also include information defining or constraining amanner in which the data can be processed together in processing chains.In the case of financial arbitrage, for example, the specification mayidentify data to be included in trading information, such as prices ofpotential trades and identifications of counterparties to potentialtrades. The specification of data for financial arbitrage may furtherinclude constraints on the way in which potential trades can be combinedto create chains of potential trades. For example, a constraint may beimposed that a sequence of trades cannot include more than one tradewith the same counterparty. Though, it should be appreciated thatembodiments are not limited to receiving modules arranged in a templateprocessing chain in any particular manner, nor are embodiments limitedto receiving a specification of data in any particular manner.

In block 804, the software development tool evaluates the templatesoftware modules, including the instructions included in the templatesoftware modules and data to be processed by the template softwaremodules. The template software modules may be evaluated to determinecharacteristics of the instructions included in each template softwaremodule and that will be included in each software module that is aninstance of the template. The specification of the data and/or examplesof the data itself may be evaluated by the software development tool toidentify characteristics of the data, such as a frequency of variabilityof the data or a manner in which the data varies.

The software development tool also, in block 806, uses the templateprocessing chain and the specification of data to generate multipleprocessing chains. Each of the processing chains generated in block 806includes software modules corresponding to the template software modulesof the template processing chain. The software development toolgenerates the multiple different processing chains by reviewing thespecification of data received in block 802 that identifies data to beprocessed by the modules of the template processing chain. When thesoftware development tool observes, in data recited in the specificationof data received in block 802, pieces of data that correspond to inputsof template software modules for the template processing chains that canbe combined in a way that satisfies the constraints for combining data,the software development tool replicates the template processing chainfor the pieces of data. By replicating the template processing chain,the software development tool creates instances of the software modulesof the template chain and configures the instances with sources ofinputs and destinations of outputs that correspond to the pieces ofdata.

The software development tool may identify the pieces of data that maybe combined in any suitable manner, as embodiments are not limited inthis respect. Sets of data may be predefined in some embodiments, andspecified in the specification of data received in block 802. In otherembodiments, the software development tool may evaluate thespecification of data and identify permutations of the data that satisfythe constraints for combining data.

By performing the generation of block 806 for each set of data definedby the specification of data, the software development tool can use thetemplate processing chain to create a full graph of interconnectedsoftware modules for execution on one or more multicore processing unitsthat processes the data identified by the specification received inblock 802.

The software development tool may then, in block 808, evaluate the graphof software modules and the interconnections between the modules todetermine characteristics of instances of the software modules and themanner in which the software modules interrelate and execute.

Based on the evaluations of blocks 804 and 808, the software developmenttool selects a configuration for the multicore processing units andprocessing cores of the multicore processing units from among multiplepotential configurations. In block 812, the software development toolproduces configuration inputs for the selected configuration. Once theconfiguration information is produced in block 812, the process 800ends. Following the process 800, the configuration information producedby the software development tool may be used to configure one or moremulticore processing units. Software modules may then be executedquickly and efficiently on processing cores of the multicore processingunits based on the configuration.

In connection with FIG. 8, examples of types of evaluation that areconducted by a software development tool reviewing software modules werenot discussed in detail. Examples of the types of evaluations that maybe carried out by a software development tool operating in accordancewith techniques described herein are described in detail in connectionwith FIG. 9.

Similar to FIG. 8, prior to the start of the process 900 of FIG. 9, asoftware developer identifies a problem to be solved, reviews operationsthat form a part of the solution to the problem and data to be processedas part of the solution, and creates types of software modules based onthat review. The types of software modules that are created may be, asdiscussed above, templates for software modules that will form a part ofthe solution. The software module types may include any suitableinstructions formatted in any suitable manner. The instructions mayinclude instructions that are executable by processing cores orinstructions arranged according to an intermediate language that is notexecutable by processing cores of the multicore processing unit(s) onwhich the modules are to be executed. In addition, the softwaredeveloper arranges the template software modules in a templateprocessing chain, such that the software development tool is able toanalyze the template software modules in the context of other moduleswith which they communicate. In the example of FIG. 9, the templateprocessing chain identifies the template software modules as well asdata to be processed by the modules and interconnections between themodules, such as input/output interconnections.

The process 900 begins in block 902, in which the software developmenttool evaluates instructions of template software modules provided to thesoftware development tool. The software development tool may evaluatethe instructions of the template software modules to identify types ofinstructions included in each of the template modules and that willtherefore be included in each of the instances of that template createdto process specific data.

The instructions included in each of the template software modules maybe evaluated in block 902 to determine whether any of the templatesoftware modules includes instructions of a type that one or more of theprocessing cores is specially adapted to execute. For example, if thesoftware development tool determines that a template software moduleincludes logical instructions, the software development tool maydetermine that instances of that template software module should, wherepossible, be assigned to a processing core that executes logicaloperations quickly and efficiently. Such logical operations may beassigned, for example, to a processing core of a central processing unitor to a processing core of a graphics processing unit that is speciallyadapted for executing logical instructions. Similarly, if the softwaremodule determines that a template software module includes memory accessoperations to exchange data with other processing cores, the softwaredevelopment tool may determine that instances of that template softwaremodule should, where possible, be assigned to a processing core thatexecutes such operations quickly and efficiently. In some embodiments,rather than merely evaluating whether a module includes theseinstructions, a number of such instructions may be determined for eachmodule. Modules with larger numbers of these instructions, such as anumber larger than other modules or a number above a threshold, may beassigned to processing cores specially adapted to perform suchprocessing.

In embodiments in which the module types include instructions in anintermediate language, the instructions included in each of the templatesoftware modules may additionally or alternatively be evaluated in block902 to determine how the intermediate language instructions will beinterpreted by an interpretation facility. An interpretation facilitymay interpret different intermediate language instructions ascorresponding to different instructions of an instruction set that maybe executed by a processing core. In some cases, an operation to becarried out may be representable in intermediate language in multipledifferent ways, as multiple different sets of instructions. Theinterpretation facility may interpret the different sets of intermediatelanguage instructions differently and may produce differentinstructions, some of which may execute on processing cores more quicklythan others. Further, the interpretation facility may interpret a set ofintermediate language instructions differently based on the data that isto be processed, such that a different set of instructions may beproduced by the interpretation facility based on characteristics of thedata or the way in which the instructions will operate on the data. Forexample, in cases in which the same type of operation is to be performedon multiple pieces of data, the interpretation facility may by defaultproduce Single Instruction, Multiple Data (SIMD) instructions forprocessing the data. The interpretation facility may produce the SIMDinstructions even if the intermediate language instructions are notwritten in a SIMD format, because of the interpretation facility'sdefault rule that SIMD instructions should be used where possible. Inembodiments in which the software module types of a template processingchain are formatted using an intermediate language, the softwaredevelopment tool may therefore evaluate intermediate languageinstructions to determine how an interpretation facility will interpretthe instructions.

In block 904, once the software development tool has created softwaremodules based on the templates and on the data to be processed by thesoftware modules, the software development tool may evaluate theinstructions of the instances. To evaluate instructions of theinstances, the software development tool may evaluate the instructionsto identify duplicate modules, superfluous instructions, anddependencies between the software modules.

The software development tool may detect modules that execute the sameinstructions on the same inputs to produce the same outputs to beduplicate modules. Duplicates may occur for a variety of reasons,including overlap in the data sets to be processed by software modules.In the example of FIG. 7C, for instance, because the two sequences ofpotential trades shown in the figure both include a trade of currency“Curr₂” for currency “Curr₃” with entity “Bank₂,” when instances oftemplate software modules are created for those processing chains, theresulting graph may include two modules that each determine the rate“Rate₂” for the same trade. If such a duplicate module were left in thegraph, then the multicore processing unit(s) may duplicate the executionof these instructions. This duplication may be unnecessary andundesirable. When a processing core executes the duplicate softwaremodule, that processing core is not executing another software modulethat may also need to execute. The duplication may therefore result in aslowdown of the execution of the software modules that may beundesirable. A similar evaluation may be made to identify redundantsoftware modules that have similarities to other software modules, suchas including the same or similar operations, operating on the same orsimilar inputs, or producing the same or similar outputs. A redundantsoftware module may not be a duplicate of another software modulebecause of a difference with the other software module, but may beredundant because the similarity between the software modules may meanthat the two software modules could be merged into one software module.A redundant software module may be undesirable for similar reasons as aduplicate software module. However, as discussed below, in some cases aduplicate or redundant software module may be desirable and may increaseexecution speed or efficiency. For example, in some cases discussedbelow, a duplicate or redundant software module may free resources orreduce the time another software module may be waiting for input.

Superfluous instructions may be sets of instructions that areunnecessary to execute. An example of a set of superfluous instructionsis an instruction to add 1 to a variable, followed at a later time by aninstruction to subtract 1 from the variable, when the variable was notused between the two instructions. Because the variable was not used,the addition and subtraction instructions do not impact execution of thesoftware modules or any other process in a substantive way, and thus theinstructions may be superfluous. Another example of superfluousinstructions includes calculating and storing a value that is not usedby the software modules, or any other process executing on theprocessing unit(s). Instructions that do not substantively affect theoperations of the software modules or other processes may besuperfluous. Superfluous instructions may be removed to increaseexecution speed and efficiency.

Dependencies between the software modules may be analyzed by thesoftware development tool for the purpose of identifying constraints onscheduling execution of software modules. If one software modulereceives as input a value output by another software module, then thesoftware module may be dependent on the other software module from whichit receives an output value. Because the software module is dependent,the software module should be scheduled for execution after execution ofthe other software module. By scheduling the software module for laterexecution, the value calculated and output by the other software modulemay be available when the software module is to be executed.Dependencies of software modules may be determined in any suitablemanner, including by reviewing destinations of outputs and sources ofinputs from software modules through reviewing stored information aboutthe software modules or monitoring execution of the software modules.

In block 906, the software development tool may test execution of thesoftware modules, such as using sample input data, to monitor executioncharacteristics of the software modules and monitor characteristics ofthe data. To monitor execution characteristics, the software developmenttool may request that the modules be executed on one or more multicoreprocessing units. In embodiments in which the software modules of thetemplate processing chain are arranged in an intermediate language, thesoftware development tool may, in block 906, request that aninterpretation facility interpret the software modules and produceinstructions that are executable on processing cores of the multicoreprocessing unit(s). In some embodiments, a management facility for amulticore processing unit may be able to monitor performance of amulticore processing facility and produce data describing theperformance. The performance data may include any suitable information.The performance data indicate, for example, how one or more instructionsof a software module were executed, how many times requested data wasnot available in a cache and was requested from other storage, how muchtime a software module spent waiting for an input to be available, orany other information describing how the software modules were executedby the multicore processing unit(s). Following execution of the softwaremodules on the multicore processing unit(s), the software developmenttool may communicate with the management facility for the multicoreprocessing unit to retrieve the performance data. From an evaluation ofthe performance data, the software development tool may be able todetermine which of the software modules are executing slowly and causingbottlenecks in the execution of the software modules. When the softwaredevelopment tool detects from the performance data a bottleneck inexecution of the software modules, the software development tool mayrespond in any suitable manner, including by diagnosing a source of thebottleneck and/or by attempting to eliminate the bottleneck. To diagnosea source of the bottleneck, the software development tool may examine atime at which the bottleneck occurs and one or more software modulesexecuting on one or more cores at that time, or any other informationregarding a context of the bottleneck. The software development tool maydetermine, from this information, the instructions that were executingat the time of the bottleneck. The software development tool mayevaluate the instructions that are executing slowly and causing thebottleneck, and/or may further review the types of the instructions orparameters of the instructions to determine a possible cause of thebottleneck. For example, the software development tool may determinethat a delay is related to a memory access operation that is requestinga large amount of data that is not present in a local cache of themulticore processing unit, and the delay is caused by waiting forretrieval of the data from disk. To monitor characteristics of the datain block 906, the software development tool may monitor the extent towhich the data changes over time, such as a number of bits in the datathat change at a given time when new data is received and is to beprovided to software modules for processing.

As part of executing the software modules in block 906, the softwaredevelopment tool may evaluate a number of modules that are to beexecuted in parallel at one time. The number of modules to be executedtogether at one time may affect the execution efficiency of softwaremodules. As more software modules are executed at one time, if thenumber of software modules to be executed is greater than the number ofavailable cores, more context switches for more processing cores have tobe carried out to swap modules on the processing cores. This canincrease execution times. Additionally, as more modules are executed inparallel, the modules may compete for resources and lengthen theexecution times for each module. However, it may also be the case that,as more software modules are executed at one time, the number ofoperations executed in parallel increases, which can decrease executiontimes. Accordingly, to determine the most efficient number of modules toexecute at any time, the software development tool executes the softwaremodules on processing cores of one or more multicore processing units.Following execution of the software modules, the software developmenttool retrieves performance data for the multicore processing units froma management facility. Performance data, as mentioned above, may includeany suitable data regarding performance of the software modules and/orthe processing cores. In some cases, the performance data may includeinformation regarding a speed with which instructions are executed and aspeed with which context switches are made.

In response to the evaluations of blocks 902-906, the softwaredevelopment tool may produce configuration information for one or moremulticore processing units. The configuration information may includethe software modules themselves, settings for hardware and/or softwareconfiguration parameters of a multicore processing unit, informationregarding how an interpretation process should be performed by aninterpretation facility, and/or information identifying constraints onscheduling of execution of the software modules.

Constraints on scheduling of execution of software modules may includeany suitable information regarding a manner in which modules should beexecuted, an absolute timing of execution of software modules, a timingof execution of software modules relative to other software modules, orany other constraint on scheduling. Scheduling constraint informationmay include, for example, information identifying dependencies betweensoftware modules, the number of software modules that may be executed atone time, types of processing cores to which types of software modulesshould be assigned, or other information identifying how modules shouldbe scheduled for execution.

As part of generating the software modules, the software developmenttool may, in block 908, modify the software modules. Modifying thesoftware modules may include modifying individual software modulesand/or modifying collections of software modules generated by thesoftware development tool based on the template software modules. Tomodify an individual software module, as discussed below, the softwaredevelopment tool may make changes to the instructions included within asoftware module or a template software module of the template processingchains. To modify a collection of software modules, the softwaredevelopment tool may add software modules to a collection or removesoftware modules from the collection. The software development tool mayalso modify a collection of software modules by editing interconnectionsbetween the software modules, including by editing the inputs andoutputs of software modules.

The software development tool may modify software modules in anysuitable manner to eliminate inefficiencies or otherwise increase thespeed and efficiency of execution of software modules.

In some embodiments, for example, a software development tool may modifysoftware modules by modifying instructions included in the softwaremodules. As discussed above, operations that may be performed as part ofa system for producing a solution to a complex problem may, in somecases, be able to be performed using various different sets ofinstructions. Some of the sets of instructions may execute moreefficiently than others, or may execute more efficiently than others ona particular type of processing core to which a software module is to beassigned. Accordingly, the software development tool may modifyinstructions included in a software module such that the software moduleincludes instructions, to carry out an operation, that will executequickly and efficiently. As also discussed above, in some embodimentsthe software development tool may evaluate software modules that areformatted according to an intermediate language that is not executableon processing cores of a multicore processing unit on which the modulesare to be executed. The intermediate-language instructions of thesoftware modules may instead be interpreted by an interpretationfacility to produce instructions that will be executed by a processingcore. As discussed above, an interpretation facility may interpret someintermediate language instructions differently than others and aninterpretation of some intermediate language instructions may result ininstructions that would execute more quickly or efficiently than others.Accordingly, in some embodiments, the software development tool maymodify a software module such that the module includes intermediatelanguage instructions for which an interpretation would produceinstructions that would execute quickly and efficiently.

As another example, a software module may include instructions to storedata, and some of these instructions may identify a location at whichthe data is to be stored. For example, some instructions may identifythat data should be stored in a memory to which a processing core mayhave preferential access, such as an on-chip cache exclusive to aprocessing core or an on-chip block cache that is accessible to one ormore other processing cores of a block. Based on the manner in which thedata is to be used during execution of the software modules, such as thefrequency of use of the data by the module or whether the data is usedby other modules, efficiencies may be gained by storing this data inparticular memory locations. Accordingly, based on evaluating the way inwhich data is to be used during execution of the software modules,memory access operations of a software module may be edited to change amemory location at which data is to be stored.

As another example, if the software development tool identified aduplicate module in the software modules that is a duplicate of anothersoftware module, the software development tool may remove the duplicatemodule in the graph. The software development tool may then determinewhether any software modules depended on the duplicate module andreceived input from the duplicate module. If so, the softwaredevelopment tool may change the interconnections of the software modulessuch that the dependent software modules depend on, and receive inputfrom, the other software module for which the duplicate module was aduplicate. In this way, the inefficiency created by the duplicatesoftware module can be eliminated, while the remaining software modulescan continue to execute correctly following removal of the duplicate.

In contrast, in some cases, the software development tool may in block908 refrain from removing a duplicate or may insert duplicate softwaremodules into the software modules. A duplicate may be inserted to removeor mitigate a bottleneck or other inefficiency identified for thesoftware modules. The bottleneck or other inefficiency may be determinedfrom data stored during the execution of the software modules and/orfrom data stored during a simulation of execution of the softwaremodules. When multiple other software modules are waiting for onesoftware module to finish processing and provide an output to theseother software modules, this may create a delay in execution due to theinability of these other software modules to execute without the input.To attempt to decrease the overall impact of the delay, the softwaredevelopment tool may attempt to create duplicate modules that may beable to execute at different times or in parallel. The softwaredevelopment tool may then alter the dependencies of the dependentsoftware modules, such that some of the dependent modules depend fromthe original module and others depend from a duplicate. This may createthe possibility that only one of the duplicate modules may be delayingat any time and only a portion of the dependent modules may be delayed,waiting for the input value. When only a portion of the dependentmodules are delayed at a given time, this may increase the executionspeed and efficiency of the software modules.

Similarly, in some cases, the software development tool may, in block908, create new software modules by splitting operations of a softwaremodule into multiple different parts. This may also be done in the caseof a bottleneck or other inefficiency identified for the softwaremodules. The bottleneck or other inefficiency may be determined fromdata stored during the execution of the software modules and/or fromdata stored during a simulation of execution of the software modules.For example, when a software module is to perform the same set ofoperations on multiple different inputs, during execution the softwaredevelopment tool may identify that some of the input data may beavailable more quickly than others of the input data, and that thesoftware module beginning or ending of the execution of the softwaremodule is delayed due to the unavailability of the input data. In somesuch cases, the software development tool may divide the operations ofthis software module into multiple different modules. Each of thecreated modules may perform the set of operations of the original moduleon one or more of the inputs to the original module. By splitting theoperations into multiple different modules, the operations that are tobe executed on each of the inputs may be executed once those inputs areavailable, rather than the original module delaying execution until allof the inputs are available. In some cases such as this, a softwaremodule may include a set of operations that are to be carried out onmultiple inputs, followed by other operations that are to be carried outon the results of those operations. For example, a software module mayinclude operations to multiply each of multiple inputs by a value andother operations to sum the products of those multiplications. When thesoftware development tool modifies a software module such as this, thesoftware development tool may create multiple modules that each performthe multiplication operation on one or more inputs and create anothermodule that includes the other operations to sum the products. When suchmodules are created, the inputs and outputs of the modules may beconfigured and dependencies determined such that the new modules areable to be inserted into the graph and processed along with othermodules of the graph.

In addition to or as an alternative to being able to divide a softwaremodule into multiple different software modules to increase efficiencyand/or speed of execution of the software module or an overall set ofsoftware modules, in some embodiments, the software development tool maybe adapted to merge software modules. Redundant software modules, whichhave similarities in operations, inputs, and/or outputs, may be mergedin some cases in which the software development tool determines that amerge would increase execution efficiency or speed. When mergingsoftware modules, the software development tool may create a softwaremodule for execution by adding to an existing software module theinstructions included in one or more other software modules. Thesoftware development tool may also configure the merged software moduleto include the inputs and outputs of two or more software modules thatwere merged to create the software module.

The software development tool may also address, in block 908,superfluous instructions that were detected in block 904. Forsuperfluous instructions, where possible, the software development toolmay attempt to remove the superfluous instructions from software modulesas part of the modifying of block 908. The removal of superfluousinstructions may be carried out in any suitable manner, including byediting software modules to remove the instructions from the softwaremodules.

In embodiments in which the software development tool edits individualsoftware modules and/or collections of software modules in the mannerdescribed above, the software development tool may do so with or withoutuser approval. In some embodiments, when the software development toolmakes a change to a software module, the software development tool mayedit the software module without accepting user input regarding thechange to be made to the software module. In other embodiments, however,the software development tool may request approval for changes from auser.

In block 910, the software development tool may set configurationparameters of the hardware and/or software of the multicore processingunit based on the evaluation. For example, based upon the evaluation ofhow the data changes, including a number of bits in the data that changewhen new data is received, the software development tool may setconfiguration parameters related to data transfer within a multicoreprocessing unit or between a multicore processing unit and anothercomponent. The configuration parameters may relate to how data istransferred between processing cores of a multicore processing unit,between a processing core a cache of a multicore processing unit, and/orbetween a multicore processing unit and shared system memory of acomputing device that includes one or more multicore processing units.The configuration parameters may be set such that exchange of data isperformed efficiently. The software development tool may set theconfiguration based, for example, on a determination of how datachanges, as determined by the evaluation of the data by the softwaredevelopment tool. For example, when the number of bits expected tochange at one time is small, the software development tool may configurethe multicore processing unit to transfer only changed bits.Transferring only the changed bits may reduce the time used fortransferring data. Though, to transfer only the changed bits, adetermination is made of which bits have changed. Making thisdetermination may increase the time necessary to transfer data.Transferring the changed bits may therefore only increase efficiencywhen a total time for the determination and transfer may be lower. Thus,when the number of bits expected to change at one time is large, thesoftware development tool may configure the multicore processing unit totransfer all bits of changed data. By transferring all bits, nodetermination of which bits have changed need be made.

Configuration parameters produced by the software development tool mayalso include configuration parameters specific to a particular softwaremodule. For example, in some cases an input for a software module may bea stream of data, and the software module may execute repeatedly toprocess data each time the data for the input changes. In some cases,configuration information relating to such a software module mayconfigure a scheduling facility for a multicore processing unit topermit the software module execute only once for each change in value ofthe input. When the scheduling facility is configured in this way, thescheduling facility may wait to execute the software module until achange in the input is detected. In other cases, though, configurationinformation relating to such a software module may configure thescheduling facility to permit the software module to execute regardlessof whether a value of the input has changed since the last time thesoftware module was executed. The scheduling facility, when configuredin this way, may therefore permit the software module to execute when aprocessing core is available to execute the software module and whenother conditions for the software module's execution (such asavailability of other inputs to the module) are met.

As mentioned above, in some embodiments an interpretation facility for amulticore processing unit may interpret the software modules evaluatedby the software development tool and produce, for the software modules,instructions that can be executed by processing cores of a multicoreprocessing unit. In some such embodiments, the interpretation facilitymay accept configuration input that governs a manner in which theinterpretation is performed. For example, the interpretation facilitymay be configured to carry out the interpretation according to defaultrules for interpretation. Some of the default rules may identifyinstructions that will be output by the interpreter when certainconditions are met. For example, an interpretation facility may beconfigured to output Single Instruction, Multiple Data (SIMD)instructions when the interpretation facility detects that a softwaremodule or multiple software modules include an operation that isrepeatedly performed on input data. The interpretation facility may beconfigured with such a default rule because using SIMD instructions may,in some cases, increase speed or efficiency of processing. Theinterpretation facility may be configurable not to use SIMD instructionsby default or not to prefer SIMD instructions over other types ofinstructions when performing the interpretation. The softwaredevelopment tool may, based on an evaluation of software modules or datato be processed by the software modules, recognize in some cases thatSIMD instructions may not result in greater execution speed orefficiency for a software module or a group of software modules. Thesoftware development tool may therefore, in these cases, outputconfiguration parameters governing how the interpretation facilityperforms the interpretation such that the interpretation facility wouldnot use SIMD instructions in some cases. An interpretation facility mayaccept other input to configure a manner in which the interpretation isperformed and the software development tool may produce outputcorresponding to these inputs the interpretation facility is configuredto accept. The software development tool is not limited to producing anyparticular configuration parameters for use by an interpretationfacility.

In block 912, the software development tool generates informationidentifying constraints on a schedule of assigned cores and relativetime of execution for software modules, which may be based on thedetermined dependencies of the software modules and the determinedcharacteristics of the instructions of each software module. Theinformation generated in block 912 may also be based on a number ofsoftware modules to execute at any one time, which may be determinedduring the testing of block 906. The scheduling constraint informationthat is generated may be in a format that is used by a schedulingfacility of one or more multicore processing units, so that thescheduling facility may directly use the constraint informationgenerated by the software development tool to determine a schedule forexecution of software modules. By generating scheduling constraintinformation based on core assignments and dependencies, the softwaredevelopment tool may enable software modules to be scheduled forexecution on processing cores that would execute the instructions of thesoftware modules quickly and efficiently. Such processing cores may bespecially adapted to execute the instructions, as discussed above.Additionally, the software development tool may monitor dependencies ofsoftware modules and enable software modules to be loaded onto aprocessing core for execution only after the execution of other softwaremodules from which the modules depend, such that software modules arenot waiting for inputs and delaying execution.

In block 914, once the configuration information is generated by thesoftware development tool in blocks 908-914, the configurationinformation is output by the software development tool. Theconfiguration information may be output in any suitable manner. In someembodiments, the configuration information may be output to a user. Inother embodiments, the configuration information may be output to astorage from which the information may be provided to a managementfacility, or may be output directly to a scheduling facility of themanagement facility. The management facility and scheduling facility maybe located on the same computing device as the software development toolor a different computing device.

In addition, in some embodiments, in block 914, the software developmenttool may output reconfiguration recommendations to a user of thesoftware development tool. The reconfiguration recommendations may beoutput in any suitable manner, as embodiments are not limited in thisrespect. The reconfiguration recommendations may also include anysuitable recommendations, including recommendations to change thesoftware modules in ways to improve speed and/or efficiency ofexecution. The recommendations may include recommendations for makingchanges to the software modules that the software development tool wasnot capable of making in block 908. For example, if the softwaredevelopment tool is not able to remove a superfluous instruction from asoftware module, the software development tool may notify the user ofthe existence of the superfluous instruction. The recommendations mayalso relate to how to improve performance of the software modules bymaking changes to the target hardware on which the modules are to beexecuted. For example, the software development tool may determine inblock 906 that a bottleneck in execution is being caused by particularinstructions included in the software modules that are not executingefficiently on one or more hardware components of the multicoreprocessing units. The software development tool may be configured withinformation regarding different types of hardware that are able toexecute instructions in different ways. The software development toolmay use this information to identify, in some cases, hardware componentsthat may execute instructions more quickly or efficiently. For example,the software development tool may determine, using performance datacollected by a management facility regarding execution of softwaremodules on a multicore processing unit, that some instructions are beingexecuted slowly. The software development tool may identify from theperformance data that the instructions that are executed slowly areinstructions that interact with an Arithmetic Logic Unit (ALU) of themulticore processing unit. The software development tool may alsodetermine that the ALU is causing a bottleneck because that ALU is notarranged to execute those specific instructions quickly. The softwaredevelopment tool may recommend, based on the information regarding thehardware, a different ALU of a different multicore processing unit thatmay execute the instructions more quickly.

Once the configuration information and the reconfigurationrecommendations are output in block 914, the process 900 ends. Followingthe process 900, in some cases, one or more multicore processing unitsmay be configured with the configuration information, or a softwaredeveloper may determine that changes should be made based to thesoftware modules on the information provided by the software developmenttool, and may make the changes to the software modules rather thanconfigure a processing unit with the software modules.

Configuration of a multicore processing unit according to configurationinformation produced by a software development tool may be carried outin any suitable manner, as embodiments are not limited in this respect.FIG. 10 illustrates one example of a process 1000 that may be performedin some embodiments for configuration a multicore processing unit.

Prior to the start of the process 1000 of FIG. 10, a softwaredevelopment tool evaluates software modules that have been created by asoftware developer and/or by the software development tool and data thathas been specified by a software developer, and produces configurationinformation. In some embodiments, the software modules created by thedeveloper/tool may have been formatted according to an intermediatelanguage, and may be interpreted by an interpretation facility of amulticore processing unit. Configuration information created by thesoftware development tool may then be used in the process 1000 of FIG.10 to configure the multicore processing unit to execute the softwaremodules evaluated by the software development tool.

The process 1000 begins in block 1002, in which a configuration facilityfor the multicore processing unit places the software modules of theconfiguration information, which were generated by the softwaredevelopment tool and may have been, in some embodiments, interpreted byan interpretation facility, in a storage medium accessible to theprocessing cores on which the modules will be executed. In block 1004,the configuration facility configures the sources of inputs of themodules and the destinations of outputs of the modules. By configuringthe inputs and outputs in block 1004, the software modules may be ableto retrieve inputs from particular memory locations and store outputs atparticular memory locations. Additionally, the multicore processing unitmay be configured with information about the inputs and outputs for eachsoftware module, such that the multicore processing unit, including ascheduling facility, is able to determine when inputs are available fora software module. The multicore processing unit may also be configuredwith information regarding the inputs and outputs and whether, when aninput changes over time a software module should only process inputsupon new data being available for the inputs. In some cases, themulticore processing unit may be configured to execute a software modulethat has a changing input when the software module is able to execute(e.g., other inputs are available or a core is available), even when newdata is not available for the changing input. By configuring thescheduling facility with information about the sources of inputs for asoftware module, the scheduling facility is able to monitor memorylocations that are to store the inputs for a software module and candetect when changes have been made to the memory locations. Such achange to a memory location may indicate that one of the inputs for asoftware module is available for the software module. The schedulingfacility may use such information to determine whether all of the inputsfor a software module are available and, if not, prevent a processingcore from being configured to execute that software module until all ofthe inputs are available. Though, as mentioned above, in someembodiments the scheduling facility may be configured, for a softwaremodule, to execute a software module when an input is available for thesoftware module but the input does not reflect new data.

In block 1006, the configuration facility provides scheduling constraintinformation, produced by the software development tool to a schedulingfacility for the multicore processing unit. The scheduling facility maythen use the scheduling constraint information to direct assignment ofsoftware modules to processing cores. The scheduling constraintinformation, as discussed above, may indicate that particular softwaremodules should be assigned to particular types of processing coresand/or that particular software modules should be assigned to cores forexecution after other software modules have executed. The schedulingconstraint information may also indicate, in some cases, a number ofsoftware modules to execute in parallel at any one time, such as amaximum number.

In block 1008, when the configuration information includes configurationparameters to be changed, such as read/write parameters for themulticore processing unit, the configuration facility changesconfiguration parameters of the hardware and/or software of themulticore processing unit. The change may be carried out in any suitablemanner, including by the configuration facility communicating with asoftware facility of the multicore processing unit or the configurationfacility storing data in a register of the multicore processing unit.

Once the configuration facility changes the configuration parameters inblock 1008, the process 1000 ends. Following the process 1000, themulticore processing unit is configured to execute software modules forsolving a complex problem quickly and efficiently on low-cost hardware,such as the multicore processing unit that was configured.

It should be appreciated that configuration of software modules,management facilities, and/or multicore processing units for executionof the software modules may be carried out more than once. In someembodiments, rather than evaluating modules, configuring a multicoreprocessing unit, and then executing the modules based on thatconfiguration without reconfiguring the modules, an evaluation andconfiguration may be carried out multiple times. For example, in someembodiments, a software development tool may obtain and reviewperformance data regarding execution of software modules on one or moremulticore processing units and produce new configuration informationbased on the performance data, and the software modules may execute onthe multicore processing unit(s) according to the new configuration. Thesoftware development tool may then repeat the reviewing of performancedata and production of new configuration information over time. Theperformance data that is collected regarding execution of softwaremodules may include performance data regarding the execution of themodules in a development environment and/or in a production environment.Accordingly, a software development tool may repeatedly change aconfiguration of a multicore processing unit to attempt to improveefficiency and/or speed of execution of the software modules.

FIG. 11 illustrates one process that may be carried out by a softwaredevelopment tool in some embodiments. The process 1100 of FIG. 11 may besimilar in some ways to the processes of FIGS. 6, 8, and 9 discussedabove, in that a software development tool performs an evaluation,produces configuration information, and configures one or more multicoreprocessing units.

Prior to the start of the process 1100 of FIG. 11, a software developeridentifies a problem to be solved, reviews operations that form a partof the solution to the problem and data to be processed as part of thesolution, and creates types of software modules based on that review.The types of software modules that are created may be, as discussedabove, templates for software modules that will form a part of thesolution. The software module types may include any suitableinstructions formatted in any suitable manner. The instructions mayinclude instructions that are executable by processing cores orinstructions arranged according to an intermediate language that is notexecutable by processing cores of the multicore processing unit(s) onwhich the modules are to be executed. In addition, the softwaredeveloper may arrange the template software modules in a templateprocessing chain, such that the software development tool is able toanalyze the template software modules in the context of other moduleswith which they communicate. In the example of FIG. 11, the templateprocessing chain identifies the template software modules as well asdata to be processed by the modules and interconnections between themodules, such as input/output interconnections.

The process 1100 begins in block 1102, in which the software developmenttool evaluates instructions of one or more template software modules ofa template processing chain and/or evaluates instructions of instancesof the one or more template software modules. The evaluation of block1102 may be carried out in any suitable manner, including according totechniques described above in connection with blocks 902-904 of FIG. 9.

In block 1104, the software modules of processing chains are executed onprocessing cores of one or more multicore processing units. The softwaredevelopment tool may, in block 1104, obtain performance data relating tothis execution and review the performance data to identify bottlenecksor other inefficiencies. The execution of review of block 1104 may becarried out in any suitable manner, including according to techniquesdescribed above in connection with block 906 of FIG. 9.

In block 1106, the software development tool, based on the evaluationsof blocks 1102, 1104, modifies software modules, produces configurationinformation, and configures one or more multicore processing units basedon the configuration information. The actions taken by the softwaredevelopment tool in block 1106 may be performed in any suitable manner,including according to techniques described above in connection withblocks 908-914 of FIG. 9.

Once the multicore processing unit(s) are configured in block 1106,software modules may be executed on the multicore processing unit(s).The multicore processing unit(s) may be a unit of a developmentenvironment and/or of a production environment, as embodiments are notlimited in this respect. If the multicore processing unit(s) form a partof a production environment, the software modules may be processing datarelating to a real-world implementation of a problem, such as byprocessing trading information for actual potential trades and selectingsequences of trades that are executed. In block 1108, the softwaredevelopment tool may monitor the execution of the software modules andevaluate performance data relating to the execution. As discussed abovein connection with block 906 of FIG. 9, the software development toolmay obtain performance data related to execution of software modulesfrom one or more management facilities corresponding to the one or moremulticore processing unit(s). The software development tool may evaluatethe performance data and, as discussed above, produce configurationinformation based on the evaluation of the performance data. Forexample, if the software development tool determines from theperformance data that one or more instructions are executing slowly, thesoftware development tool may take steps to improve the execution speedor efficiency. Accordingly, in block 1110, the software development toolmay again modify software modules, produce configuration information,and configure one or more multicore processing units based on theconfiguration information. The actions taken by the software developmenttool in block 1110 may be performed in any suitable manner, includingaccording to techniques described above in connection with blocks908-914 of FIG. 9.

Once the software development tool modifies the modules, produces theconfiguration information, and configures the multicore processingunit(s), the process 1100 returns to block 1108, in which the modulesare executed and the software development tool monitors execution.Accordingly, the software development module may continue to monitorexecution of the modules and change a configuration of a multicoreprocessing unit over time, even as the modules are executed in aproduction environment.

During the continued monitoring and reconfiguration of the process 1100of FIG. 11, the software development tool may make changes and determinewhether the changes resulted in an improvement to execution speed orefficiency. In some cases, a change made by the software developmenttool may not result in an improvement to execution speed or efficiency.For example, in some cases, a change made by the software developmenttool may mistakenly result in a drop in execution speed or efficiency.The software development tool may therefore store performance data forprevious configurations produced by the software development tool and,in block 1108, compare performance data for a current configuration toperformance data for one or more previous configurations. The softwaredevelopment tool may also store information regarding changes previouslymade to configurations. If the software development tool determines inblock 1108 from the comparison that a new configuration has resulted ina drop in execution speed or efficiency, the software development toolmay in block 1110 undo the changes made to the configuration.

The software development tool is not limited to selecting configurationchanges to be made during the loop of FIG. 11 in any particular manner.In some embodiments, the software development tool may make changes to aconfiguration based on the performance data and a determination that achange to configuration may result in an improvement in performance. Insome embodiments, however, the software development tool may be arrangedto iteratively attempt different available configurations to determinewhen a configuration results in an improvement in execution speed and/orefficiency for the software modules. For example, in some embodiments,the software development tool may iterate through multiple differentpermutations of one or more possible configuration settings for amanagement facility and/or multicore processing unit. In someembodiments in which the software development tool iterates throughmultiple permutations, the software development tool may iterate throughall permutations of possible configuration settings for a managementfacility and/or multicore processing unit. The software development toolmay then determine, based on performance data collected during executionof the different permutations, which configuration provides the highestexecution speed and/or efficiency for software modules. However, itshould be appreciated that embodiments are not limited to implementing asoftware development tool that selects configuration changes in anyparticular manner.

As discussed above, configuration information, including schedulingconstraint information, may be in any suitable format and include anysuitable information. Embodiments are not limited to operating with anyparticular type or format of scheduling constraint information, or tooperating with multicore processing units that use any particular typeof scheduling constraint information. FIG. 12 illustrates one example ofa manner in which software modules may be scheduled for execution onprocessing cores of one or more multicore processing units.

In the example of FIG. 12, scheduling constraint information identifiesa time at which software modules are to be assigned for execution toprocessing cores in terms of a wave to which the software module isassigned. A “wave” may include a group of software modules that are tobe executed in parallel at the same time on processing cores of one ormore multicore processing units. A wave may include any suitable numberof software modules, including the same number of software modules asthere are processing cores to which software modules may be assigned.Additionally, a wave may include any suitable types of software modules.In some embodiments, when software modules are assigned to waves, thesoftware modules may be assigned based on type. For example, the modulesmay be assigned such that all modules of a first type are assigned toexecute first, then once all modules of that type have been executed,all modules of a second type may be executed. Executing all modules of acertain type may, in some embodiments, include executing the modules inmultiple different waves, such as when the number of modules of acertain type is larger than the number of processing cores. In othercases, however, modules may be assigned to waves to account fordependencies between modules—to prevent a dependent module from beingexecuted before a module on which it depends—and modules of differenttypes may be executed in the same wave.

A software development tool operating in accordance with techniquesdescribed herein, or any other suitable human or software entity mayassign software modules to waves based on any suitable factors. Forexample, software modules may be assigned to waves such that when afirst software module is dependent on a second software module, thefirst software module is assigned to a later wave than the secondsoftware module. By assigning the first software module to a later wavethan the second software module, when the wave to which the secondsoftware module is assigned finishes executing, the input on which thefirst software module depends may be available before the wave to whichthe first software module is assigned begins executing.

The process 1200 of FIG. 12 illustrates one technique for operating ascheduling facility of a multicore processing unit to schedule softwaremodules for execution on processing cores based on waves to which thesoftware modules have been assigned. Prior to the start of the process1200, software modules are created by a software developer and/or asoftware development tool operating according to techniques describedherein. The software development tool produces configuration informationfor the software modules, which includes scheduling constraintinformation that assigned the software modules to waves. The schedulingconstraint information is provided to a scheduling facility and thesoftware modules are stored in a location from which the softwaremodules may be transferred to the processing cores to which the softwaremodules are to be assigned.

The process 1200 begins in block 1202, in which input data to beprocessed by a set of software modules for parallel execution onprocessing cores is provided to one or more multicore processing unitsand to the software modules to be executed on the processing cores. Thescheduling facility, in response to detecting that input data isavailable for the first set of software modules to be executed, triggersexecution of a first wave of software modules. The software modules thenexecute on the processing cores to which they have been assigned and, asa result of execution, write outputs to memory. In block 1206, thescheduling facility triggers execution of a second wave of modules. Thescheduling facility may trigger execution of second wave modules upondetermining that the software modules of the first wave have finishedexecuting, and that all of the inputs on which software modules of thesecond wave depend are available for processing by the software modulesof the second wave. As part of triggering execution of the second wavemodules, the scheduling facility causes a context switch for each of theprocessing cores on which the first wave of software modules wereexecuted and the second wave of software modules are to execute. As partof the context switch of block 1206, instructions for the first wave ofsoftware modules are removed from the processing cores and instructionsfor the second wave of software modules are made available to theprocessing cores, including by being stored in a storage accessible tothe cores (e.g., an on-chip cache). In addition, data processed by thefirst wave of software modules is removed from the processing cores, anddata to be processed by the second wave of software modules is madeaccessible to the processing cores.

The scheduling facility may continue, in block 1208, triggeringexecution of successive waves of software modules and switching contextsof processing cores to different software modules until each of thesoftware modules to be executed on the processing cores has beenexecuted. In block 1210, once the last wave of software modules has beenassigned for execution on the processing core, one of the modules oranother software facility executing on the processing units may evaluateoutputs of the software modules that have been executed and identify asolution to the problem for which the software modules were executing onthe processing cores of the multicore processing units. In block 1212,once the solution to the problem has been determined, the solution isoutput from the multicore processing units to any suitable destination.

In some cases, following output of a solution to the problem to whichthe software modules relate, execution of the software modules may end.This may be the case, for example, where the software modules areintended to be executed once to determine a single solution to aproblem. In other cases, however, such as in the example of FIG. 12, thesoftware modules may relate to a problem that is designed to be runsuccessively on different pieces of input data received over time, suchas based on a stream of data received over time. Accordingly, asillustrated in FIG. 12, following output of the solution in block 1212,the process 1200 returns to block 1202 to receive new input data.

It should be appreciated from the foregoing, techniques described hereinmay be used with any suitable software modules relating to any suitablecomplex problem. Embodiments are not limited to operating with anyparticular problem or type of problem, or evaluating any particular dataor type of data, or executing any particular instructions or type ofinstructions. As in examples described above, techniques describedherein may be used in a financial setting to perform operations relatedto a financial arbitrage problem, for identifying desirable sequences oftrades to carry out in a financial arbitrage setting.

FIG. 13 illustrates an example of an overall process for executingsoftware modules related to financial arbitrage on multicore processingunits. In the example of FIG. 13, the multicore processing units includeone or more central processing units and one or more graphics processingunits. The central processing units and graphics processing units may becomponents of a computing device that is located in a computer systemsimilar to the one illustrated in FIG. 2. Accordingly, the computingdevice including the multicore processing units on which the softwaremodules may execute may be communicatively coupled to a bridge that maybe instructed to carry out financial trades.

Prior the start of the process 1300 of FIG. 13, the software developercreates software modules and the software modules are evaluated bysoftware development tool. In accordance with techniques describedherein, the software development tool may produce configurationinformation for multicore processing units based on the software modulescreated by the software developer. Configuration information, includingscheduling constraint information, produced by the software developmenttool may be used to configure the multicore processing units of thecomputing device.

The process 1300 begins in block 1302, in which processing cores of thegraphics processing unit are configured by a configuration facility toexecute software modules for financial arbitrage. In block 1304, thecentral processing unit of the computing device is configured withinstructions for acting as a conduit for transferring tradinginformation and trading instructions between the graphics processingunit and the bridge of the computing system.

In block 1306, the central processing unit receives trading informationtransmitted to the computing device by the bridge and, in block 1308,the central processing unit provide the receipts trading information tothe graphics processing unit. By providing the trading information tothe graphics processing unit, the trading information is made availableto software modules will execute on processing cores of the graphicsprocessing unit. Accordingly, in block 1310, software modules areexecuted in successive waves on the processing cores of the graphicsprocessing unit, to process the trading information received from thebridge. The software modules that are executed by the processing coresin block 1310 and may include any suitable software modules executingany suitable instructions. In some embodiments, for example, thesoftware modules executed in block 1310 may include software modules ofthe types illustrated in FIG. 7C.

As a result of processing the trading information received from thebridge, these software modules, executed on the processing cores of thegraphics processing unit may collectively select a sequence of potentialtrades to be executed that has the highest potential for profit out ofthe sequences of potential trades identified by the trading informationreceived from the bridge. In block 1312, the identification of theselected sequence of potential trades having the highest potential forprofit is received at the central processing unit from the graphicsprocessing unit. In response, the central processing unit, in block1314, creates an instruction identifying that the sequence of potentialtrades should be executed, and transmits the instructions to the bridge.

After the instruction has been transmitted to the bridge in block 1314,process 1300 returns the block 1306, in which the central processingunit receives new trading information and again, in block 1308, providesthe trading information to the graphics processing unit for processingby the software modules. In some embodiments, the central processingunit and graphics processing unit of the computing device may continueprocessing trading information and issuing instructions for sequences ofpotential trades to be executed for as long as trading information isreceived at the bridge and communicated to the central processing unit.

In the examples of FIGS. 2 and 13, the bridge was illustrated anddescribed as being implemented on a different computing device than thecomputing device including the multicore processing units executing thesoftware modules. It should be appreciated, however, that embodimentsare not limited to implementing a bridge and multicore processingunit(s) executing the software modules on different computing devices.For example, a computing device may implement bridge functionality andmay additionally include one or more multicore processing units on whichsoftware modules may be executed to evaluate trading informationreceived by the bridge and identify desirable sequences of potentialtrades to be executed. In some embodiments that process tradinginformation, the bridge and the multicore processing units may beimplemented together or separately using one or more rack-mountedservers that are co-located in a server room with devices distributestrading information on behalf of one or more counterparties to potentialtrades.

Techniques operating according to the principles described herein may beimplemented in any suitable manner. Included in the discussion above area series of flow charts showing the steps and acts of various processesthat configure low-cost hardware to execute operations for complexproblems quickly and efficiently. The processing and decision blocks ofthe flow charts above represent steps and acts that may be included inalgorithms that carry out these various processes. Algorithms derivedfrom these processes may be implemented as software integrated with anddirecting the operation of one or more single- or multi-purposeprocessors, may be implemented as functionally-equivalent circuits suchas a Digital Signal Processing (DSP) circuit, a Field-Programmable GateArray (FPGA), or an Application-Specific Integrated Circuit (ASIC), ormay be implemented in any other suitable manner. It should beappreciated that the flow charts included herein do not depict thesyntax or operation of any particular circuit or of any particularprogramming language or type of programming language. Rather, the flowcharts illustrate the functional information one skilled in the art mayuse to fabricate circuits or to implement computer software algorithmsto perform the processing of a particular apparatus carrying out thetypes of techniques described herein. It should also be appreciatedthat, unless otherwise indicated herein, the particular sequence ofsteps and/or acts described in each flow chart is merely illustrative ofthe algorithms that may be implemented and can be varied inimplementations and embodiments of the principles described herein.

Accordingly, in some embodiments, the techniques described herein may beembodied in computer-executable instructions implemented as software,including as application software, system software, firmware,middleware, embedded code, or any other suitable type of computer code.Such computer-executable instructions may be written using any of anumber of suitable programming languages and/or programming or scriptingtools, and also may be compiled as executable machine language code orintermediate code that is executed on a framework or virtual machine.

When techniques described herein are embodied as computer-executableinstructions, these computer-executable instructions may be implementedin any suitable manner, including as a number of functional facilities,each providing one or more operations to complete execution ofalgorithms operating according to these techniques. A “functionalfacility,” however instantiated, is a structural component of a computersystem that, when integrated with and executed by one or more computers,causes the one or more computers to perform a specific operational role.A functional facility may be a portion of or an entire software element.For example, a functional facility may be implemented as a function of aprocess, or as a discrete process, or as any other suitable unit ofprocessing. If techniques described herein are implemented as multiplefunctional facilities, each functional facility may be implemented inits own way; all need not be implemented the same way. Additionally,these functional facilities may be executed in parallel and/or serially,as appropriate, and may pass information between one another using ashared memory on the computer(s) on which they are executing, using amessage passing protocol, or in any other suitable way.

Generally, functional facilities include routines, programs, objects,components, data structures, etc. that perform particular tasks orimplement particular abstract data types. Typically, the functionalityof the functional facilities may be combined or distributed as desiredin the systems in which they operate. In some implementations, one ormore functional facilities carrying out techniques herein may togetherform a complete software package. These functional facilities may, inalternative embodiments, be adapted to interact with other, unrelatedfunctional facilities and/or processes, to implement a software programapplication.

Some exemplary functional facilities have been described herein forcarrying out one or more tasks. It should be appreciated, though, thatthe functional facilities and division of tasks described is merelyillustrative of the type of functional facilities that may implement theexemplary techniques described herein, and that embodiments are notlimited to being implemented in any specific number, division, or typeof functional facilities. In some implementations, all functionality maybe implemented in a single functional facility. It should also beappreciated that, in some implementations, some of the functionalfacilities described herein may be implemented together with orseparately from others (i.e., as a single unit or separate units), orsome of these functional facilities may not be implemented.

Computer-executable instructions implementing the techniques describedherein (when implemented as one or more functional facilities or in anyother manner) may, in some embodiments, be encoded on one or morecomputer-readable media to provide functionality to the media.Computer-readable media include magnetic media such as a hard diskdrive, optical media such as a Compact Disk (CD) or a Digital VersatileDisk (DVD), a persistent or non-persistent solid-state memory (e.g.,Flash memory, Magnetic RAM, etc.), or any other suitable storage media.Such a computer-readable medium may be implemented in any suitablemanner, including as computer-readable storage media 1406 of FIG. 14described below (i.e., as a portion of a computing device 1400) or as astand-alone, separate storage medium. As used herein, “computer-readablemedia” (also called “computer-readable storage media” or “storagemedia”) refers to tangible storage media. Tangible storage media arenon-transitory and have at least one physical, structural component. Ina “computer-readable medium,” as used herein, at least one physical,structural component has at least one physical property that may bealtered in some way during a process of creating the medium withembedded information, a process of recording information thereon, or anyother process of encoding the medium with information. For example, amagnetization state of a portion of a physical structure of acomputer-readable medium may be altered during a recording process.

Further, some techniques described above comprise acts of storinginformation (e.g., data and/or instructions) in certain ways for use bythese techniques. In some implementations of these techniques—such asimplementations where the techniques are implemented ascomputer-executable instructions—the information may be encoded on acomputer-readable storage media. Where specific structures are describedherein as advantageous formats in which to store this information, thesestructures may be used to impart a physical organization of theinformation when encoded on the storage medium. These advantageousstructures may then provide functionality to the storage medium byaffecting operations of one or more processors interacting with theinformation; for example, by increasing the efficiency of computeroperations performed by the processor(s).

In some, but not all, implementations in which the techniques may beembodied as computer-executable instructions, these instructions may beexecuted on one or more suitable computing device(s) operating in anysuitable computer system, including the exemplary computer system ofFIG. 1, or one or more computing devices (or one or more processors ofone or more computing devices) may be programmed to execute thecomputer-executable instructions. A computing device or processor may beprogrammed to execute instructions when the instructions are stored in amanner accessible to the computing device/processor, such as in a localmemory (e.g., an on-chip cache or instruction register, acomputer-readable storage medium accessible via a bus, acomputer-readable storage medium accessible via one or more networks andaccessible by the device/processor, etc.). Functional facilities thatcomprise these computer-executable instructions may be integrated withand direct the operation of a single multi-purpose programmable digitalcomputer apparatus, a coordinated system of two or more multi-purposecomputer apparatuses sharing processing power and jointly carrying outthe techniques described herein, a single computer apparatus orcoordinated system of computer apparatuses (co-located or geographicallydistributed) dedicated to executing the techniques described herein, oneor more Field-Programmable Gate Arrays (FPGAs) for carrying out thetechniques described herein, or any other suitable system.

FIG. 14 illustrates one exemplary implementation of a computing devicein the form of a computing device 1400 that may be used in a systemimplementing the techniques described herein, although others arepossible. It should be appreciated that FIG. 14 is intended neither tobe a depiction of necessary components for a computing device to operatein accordance with the principles described herein, nor a comprehensivedepiction.

Computing device 1400 may comprise at least one processor 1402 that mayinclude one or more multicore processors, a network adapter 1404, andcomputer-readable storage media 1406. Computing device 1400 may be, forexample, a desktop or laptop personal computer, a server, a rack-mountedcomputer, or any other suitable computing device. The at least oneprocessor 1402 may include one or more multicore processing units, whichmay include central processing units and/or graphics processing units.Network adapter 1404 may be any suitable hardware and/or software toenable the computing device 1400 to communicate wired and/or wirelesslywith any other suitable computing device over any suitable computingnetwork. The computing network may include wireless access points,switches, routers, gateways, and/or other networking equipment as wellas any suitable wired and/or wireless communication medium or media forexchanging data between two or more computers, including the Internet.Computer-readable media 1406 may be adapted to store data to beprocessed and/or instructions to be executed by processor 1402.Processor 1402 enables processing of data and execution of instructions.The data and instructions may be stored on the computer-readable storagemedia 1406.

The data and instructions stored on computer-readable storage media 1406may comprise computer-executable instructions implementing techniqueswhich operate according to the principles described herein. In theexample of FIG. 14, computer-readable storage media 1406 storescomputer-executable instructions implementing various facilities andstoring various information as described above. Computer-readablestorage media 1406 may store an evaluation facility 1408 that mayoperate as a software development tool in accordance with techniquesdescribed herein. The evaluation facility 1408 may perform any suitableoperations to evaluate software modules for execution one processingcores of one or more multicore processing units. The computer-readablestorage media 1406 may also include a scheduling facility 1410 thatoperates according to scheduling constraint information to assignsoftware modules to processing cores of one or more multicore processingunits for execution. The computer-readable storage media 1406 mayadditionally store software modules 1412 for execution on processingcores, and may store a configuration facility 1414 to configure one ormore multicore processing units for executing the software modules 1412based on configuration information generated by the evaluation facility1408.

While not illustrated in FIG. 14, a computing device may additionallyhave one or more components and peripherals, including input and outputdevices. These devices can be used, among other things, to present auser interface. Examples of output devices that can be used to provide auser interface include printers or display screens for visualpresentation of output and speakers or other sound generating devicesfor audible presentation of output. Examples of input devices that canbe used for a user interface include keyboards, and pointing devices,such as mice, touch pads, and digitizing tablets. As another example, acomputing device may receive input information through speechrecognition or in other audible format.

Embodiments have been described where the techniques are implemented incircuitry and/or computer-executable instructions. It should beappreciated that some embodiments may be in the form of a method, ofwhich at least one example has been provided. The acts performed as partof the method may be ordered in any suitable way. Accordingly,embodiments may be constructed in which acts are performed in an orderdifferent than illustrated, which may include performing some actssimultaneously, even though shown as sequential acts in illustrativeembodiments.

Various aspects of the embodiments described above may be used alone, incombination, or in a variety of arrangements not specifically discussedin the embodiments described in the foregoing and is therefore notlimited in its application to the details and arrangement of componentsset forth in the foregoing description or illustrated in the drawings.For example, aspects described in one embodiment may be combined in anymanner with aspects described in other embodiments.

Use of ordinal terms such as “first,” “second,” “third,” etc., in theclaims to modify a claim element does not by itself connote anypriority, precedence, or order of one claim element over another or thetemporal order in which acts of a method are performed, but are usedmerely as labels to distinguish one claim element having a certain namefrom another element having a same name (but for use of the ordinalterm) to distinguish the claim elements.

Also, the phraseology and terminology used herein is for the purpose ofdescription and should not be regarded as limiting. The use of“including,” “comprising,” “having,” “containing,” “involving,” andvariations thereof herein, is meant to encompass the items listedthereafter and equivalents thereof as well as additional items.

The word “exemplary” is used herein to mean serving as an example,instance, or illustration. Any embodiment, implementation, process,feature, etc. described herein as exemplary should therefore beunderstood to be an illustrative example and should not be understood tobe a preferred or advantageous example unless otherwise indicated.

Having thus described several aspects of at least one embodiment, it isto be appreciated that various alterations, modifications, andimprovements will readily occur to those skilled in the art. Suchalterations, modifications, and improvements are intended to be part ofthis disclosure, and are intended to be within the spirit and scope ofthe principles described herein. Accordingly, the foregoing descriptionand drawings are by way of example only.

What is claimed is:
 1. An apparatus comprising: at least one processor;and at least one storage medium having encoded thereon executableinstructions that, when executed by the at least one processor, causethe at least one processor to carry out a method comprising: generatinga plurality of processing chains for parallel execution on at least oneprocessing unit comprising a plurality of processing cores, thegenerating comprising generating each of the plurality of processingchains according to a template processing chain and a specification ofdata to be processed by the plurality of processing chains, the templateprocessing chain comprising a plurality of software modules in which atleast one software module receives one or more inputs that are one ormore outputs of one or more other software modules of the plurality ofsoftware modules, the specification of data defining a plurality ofinputs to be processed by the plurality of processing chains, whereinthe generating comprises generating the plurality of processing chainssuch that each processing chain of the plurality of processing chains isadapted to cause the at least one processing unit to perform operationsdefined by the plurality of software modules on at least a portion ofthe data defined by the specification of data; selecting a configurationfor the at least one processing unit for executing the plurality ofprocessing chains on the plurality of processing cores, the selectingbeing carried out based at least in part on an execution efficiency ofthe plurality of processing chains when the at least one processing unitis configured according to the configuration; and producingconfiguration information for configuring the at least one processingunit according to the configuration, wherein: selecting theconfiguration comprises selecting a configuration in which softwaremodules of each of the plurality of processing chains are executed onthe plurality of processing cores according to a schedule; the pluralityof processing cores comprises a plurality of types of processing core;at least one first processing core of a first type of processing corecomprises at least one first component permitting the at least one firstprocessing core to execute a first set of one or more instructions in amanner different from a manner in which processing cores of other typesexecute the first set of one or more instructions; at least one secondprocessing core of a second type of processing core does not comprisethe at least one first component; and the method further comprisesgenerating scheduling information for the configuration at least in partby assigning software modules of the plurality of processing chains totypes of processing core, wherein the assigning comprises: evaluatingexecutable instructions of a first software module of the softwaremodules, identifying, based on the evaluating, that the first type ofprocessing core is suitable for executing the executable instructions ofthe first software module, and storing information indicating that thefirst software module is to be executed on a processing core of thefirst type.
 2. The apparatus of claim 1, wherein selecting theconfiguration based at least in part on execution efficiency comprisesselecting the configuration based at least in part on amount of time theplurality of processing cores will spend executing operations for theplurality of processing chains and/or the amount of time the pluralityof processing cores will spend not executing operations for theplurality of processing chains when the at least one processing unit isconfigured according to the configuration.
 3. The apparatus of claim 1,wherein evaluating executable instructions of the first software modulecomprises evaluating the executable instructions to determine one ormore characteristics of the executable instructions selected from agroup of characteristics consisting of: a number of storage accessinstructions included in the executable instructions, a type of storageaccessed by the executable instructions, and a number of logicinstructions included in the executable instructions.
 4. The apparatusof claim 1, wherein: a number of software modules included in theplurality of processing chains is larger than a number of processingcores of the plurality of processing cores; the scheduling informationcomprises information identifying a relative time at which each of thesoftware modules included in the plurality of processing chains willexecute on a processing core of the plurality of processing cores; andthe method further comprises generating the scheduling information forthe configuration at least in part by assigning each software module ofthe plurality of processing chains to execute on a processing core ofthe plurality of processing cores at a relative time, wherein theassigning comprises: identifying, in software modules of the pluralityof software modules, a second software module that receives as an inputdata generated as an output of a first software module, and storinginformation indicating that the plurality of processing cores shouldexecute the first software module prior to executing the second softwaremodule.
 5. The apparatus of claim 1, wherein producing the configurationinformation comprises producing scheduling information in a format inwhich a scheduling tool for the at least one processing unit is adaptedto receive scheduling information.
 6. The apparatus of claim 1, wherein:the specification of data defines a plurality of types of data; andgenerating the plurality of processing chains comprises generating aprocessing chain that corresponds to a plurality of permutations of theplurality of types of data, each permutation of the plurality ofpermutations comprising more than one of the plurality of types of data,the plurality of processing chains each being adapted to accept aplurality of inputs for the types of data of the correspondingpermutation and to cause the at least one processing unit to performoperations defined by the plurality of software modules on the pluralityof inputs.
 7. The apparatus of claim 6, wherein identifying theplurality of permutations comprises generating all permutations of theplurality of types of data.
 8. The apparatus of claim 6, wherein themethod further comprises: receiving a user input triggering generationof the plurality of processing chains; and generating the plurality ofprocessing chains according to the template processing chain withoutreceiving further user input during the generating.
 9. The apparatus ofclaim 1, wherein generating the plurality of processing chains forparallel execution on the at least one processing unit comprising theplurality of processing cores comprises generating at least some of theplurality of processing chains for execution on at least onegenerally-programmable graphics processing unit.
 10. The apparatus ofclaim 1, wherein generating the plurality of processing chains forparallel execution on the at least one processing unit comprising theplurality of processing cores comprises generating at least some of theplurality of processing chains for execution on at least onefield-programmable gate array (FPGA).
 11. An apparatus comprising: atleast one processor; at least one storage medium having encoded thereonexecutable instructions that, when executed by the at least oneprocessor, causes the at least one processor to carry out a methodcomprising: evaluating a plurality of software modules for execution onat least one processing unit comprising a plurality of processing cores,a first portion of the plurality of software modules receiving as inputsoutputs generated by a second portion of the plurality of softwaremodules, the evaluating comprising evaluating the plurality of softwaremodules to identify at least one change to the plurality of softwaremodules to increase an execution efficiency of the plurality of softwaremodules when executed in parallel on the at least one processing unitcomprising the plurality of processing cores; and in response to theevaluating, automatically editing the plurality of software modules toimplement the at least one change to increase the execution efficiencyof the plurality of software modules, wherein: the plurality of softwaremodules comprises a first software module that provides an output to atleast one second software module and at least one third software module;and automatically editing the plurality of software modules comprisescreate a duplicate software module for the first software module,configuring the first software module to provide the output to the atleast one second software module and not the at least one third softwaremodule, and configuring the duplicate software module to provide theoutput to the at least one third software module.
 12. The apparatus ofclaim 11, wherein: evaluating the plurality of software modulescomprises evaluating software modules of the plurality of softwaremodules to identify a software module that provides an output to aplurality of software modules, and creating the duplicate softwaremodule is performed in response to determining, based on the evaluating,that the first software module of the software modules generated duringthe generating of the plurality of processing chains provides an outputto the at least one second software module and the at least one thirdsoftware module.
 13. An apparatus comprising: at least one processor;and at least one storage medium having encoded executable instructionsthat, when executed by the at least one processor, causes the at leastone processor to carry out a method comprising: evaluating operations ofa plurality of software modules to be executed in parallel on aplurality of processing cores; evaluating characteristics of data to beprocessed by the plurality of software modules; and based at least inpart on the evaluation of the operations and the evaluation of thecharacteristics, producing configuration information for configuring theplurality of processing cores to execute the plurality of softwaremodules, wherein producing the configuration information comprisesproducing configuration information based at least in part ondifferences between a first type of processing core of the plurality ofprocessing cores and a second type of processing core of the pluralityof processing cores, wherein: the plurality of processing corescomprises at least one first processing core of the first type, each ofthe at least one first processing core of the first type comprising atleast one first component permitting the at least one first processingcore to execute a first set of one or more instructions in a mannerdifferent from a manner in which processing cores of the second typeexecute the first set of one or more instructions; the plurality ofprocessing cores comprises at least one second processing core of thesecond type, each of the at least one second processing core notcomprising the at least one first component; the plurality of softwaremodules comprise a first software module including instructions that,when executed by a processing core of the plurality of processing cores,cause the processing core to perform an operation; evaluating operationsof the plurality of software modules comprises evaluating theinstructions to cause a processing core the perform the operation; andproducing the configuration information comprises: determining thatinstructions of the first set of instructions would cause a processingcore to perform the operation; and editing the first software module toreplace at least a portion of the instructions to cause a processingcore to perform the operation with instructions of the first set ofinstructions.
 14. The apparatus of claim 13, wherein the at least onefirst processing core is one or more processing cores of a graphicsprocessing unit and the at least one second processing core is one ormore processing cores of a central processing unit.
 15. The apparatus ofclaim 13, wherein the at least one first processing core is one or moreprocessing cores of a graphics processing unit and the at least onesecond processing core is one or more other processing cores of thegraphics processing unit.
 16. The apparatus of claim 13, wherein the atleast one first processing core is one or more processing cores of afirst multicore processing unit and the at least one second processingcore is one or more processing cores of a second multicore processingunit different from the first multicore processing unit.
 17. Theapparatus of claim 13, wherein: each of the at least one secondprocessing core of the second type comprises at least one secondcomponent permitting the at least one second processing core to executea second set of one or more instructions in a manner different from amanner in which processing cores of the first type execute the secondset of one or more instructions; the at least one first processing coreof the first type does not comprise the at least one second component.18. The apparatus of claim 13, wherein: the at least one first componentof the at least one first processing core of the first type is aplurality of first components; and the plurality of first componentscollectively permit the at least one first processing core of the firsttype to execute the first set of instructions in the manner differentfrom the manner in which processing cores of the second type execute thefirst set of instructions.
 19. The apparatus of claim 13, wherein: theplurality of software modules comprise a first software module includinga first instruction of the first set of instructions; evaluatingoperations of the plurality of software modules comprises evaluating thefirst instruction; and producing configuration information based atleast in part on differences between the first type of processing coreand the second type of processing core comprises producing schedulinginformation indicating that the first software module should be executedon one of the at least one first processing core of the of the firsttype.
 20. The apparatus of claim 19, wherein producing the configurationinformation comprises: determining, during the evaluation of theoperations of the plurality of processing cores, that the first softwaremodule comprises the first instruction of the first set of instructions;and in response to determining that the first software module comprisesthe first instruction, producing the scheduling information indicatingthat the first software module should be executed on one of the at leastone first processing core of the first type.